[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: computer-go: Board evaluation by counting...
> -----Original Message-----
> From: owner-computer-go@xxxxxxxxxxxxxxxxx
> [mailto:owner-computer-go@xxxxxxxxxxxxxxxxx]On Behalf Of Julian
> Churchill
> Sent: Wednesday, September 19, 2001 7:49 PM
> To: computer-go@xxxxxxxxxxxxxxxxx
> Subject: RE: computer-go: Board evaluation by counting...
>
>
>
> I'm doing something along those lines. I have a net that takes 9x9 areas
> centred on a potential move and then outputs a value indicating how 'good'
> the move is. The training data is taken from pro tournament games
> and seems
> to produce a network that is fairly generalised but useful if used with
> other things such as alpha-beta. I'm hoping to add some more networks that
> specialise in the opening game and perhaps corner and edge moves to get
> around the overgeneralisation that the current network suffers.
>
That sounds more like the NN approach discussed here before. I have severe
doubts this will actually work.
> Apart from this if you're going to use a neural net to estimate
> territory,
> in other words to be used as an evaluation function, why not use temporal
> difference learning methods that worked so well for backgammon? I think it
> Nicol Schraudolph wrote a paper or two on this and I seem to remember the
> results were very promising.
>
It is not at all an evaluation function, but an influence function.
Influence is just a small component of an evaluation function in Go.
What is the similarity between Go and Backgammon? Is there any at all that
would suggest that which works for one is going to be remotely useful for
the other?
In general I don't believe in the 'magical' approaches where one single
solution will give a Go playing program. Go is too complex. Feeding pro
games to a learning algorithm and hope it will learn to play Go is like
feeding Shakespeare to a monkey and hope it will learn English. (The
comparison is actually an insult to the monkey, which is capable of learning
a great deal more than any known learning algorithm.)
> Julian Churchill
>
> > -----Original Message-----
> > From: owner-computer-go@xxxxxxxxxxxxxxxxx
> > [mailto:owner-computer-go@xxxxxxxxxxxxxxxxx]On Behalf Of Mark Boon
> > Sent: 19 September 2001 12:53
> > To: computer-go@xxxxxxxxxxxxxxxxx
> > Subject: RE: computer-go: Board evaluation by counting...
> >
> >
> > Your proposition sounds like the one I made a while ago when
> > people were
> > discussing using Neural Nets for Go. However, your message
> > rather suggests
> > this is a tried and tested method, which as far as I know
> > it's not. I think
> > this would be an interesting experiment (although I would
> > expect that such a
> > NN would be too slow for practical use at this time) that would give a
> > better idea about how NNs could be applied usefully to computer-go.
> >
> > To suggest this as an alternative to the poor guy who's
> > trying to figure out
> > a way to design a simple influence algorithm is a bit crass.
> >
> > Mark Boon
> >
> > >
> > > Sounds like an ad hoc method. Why do you think it will give
> > a better way
> > > to calculate territory? Why you think it will be more accurate? Why?
> > > Asking these questions is definitely reasonable scientific practice.
> > >
> > > I think that methods for estimating territory (i.e. methods
> > for evaluating
> > > a board position) must be derived more from the game itself
> > and less from
> > > the ease of implementing of or the smallness of an algorithm.
> > >
> > > One possibility would be this: Choose a big set of
> > dan-level games that
> > > have played until end. Choose a game G. Guess where the
> > territories were
> > > at the end. This can be done heuristically with good
> > precision. It doesn't
> > > matter if the judgements go wrong at times as long as most
> > of the time the
> > > territories are guessed correctly.
> > >
> > > Then walk backwards in the game G. Correlate by a
> > computational learning
> > > method the patterns of stones in the positions of G with
> > the known sets of
> > > final territories. In this way you can train a learning
> > system to predict
> > > where territories will appear, given a position that is not
> > final. Of
> > > course, the system will make errors because it doesn't
> > understand the
> > > full-board tactical aspects of go, but it should work
> > better (and consume
> > > more resources) than a basic influence-based method.
> > >
> > > To illustrate, we `know' based on our experience that
> > >
> > > . . . . . .
> > > . X . . X .
> > > . . . . . .
> > > ._._._._._.
> > >
> > > is a relative stable formation. This means that in `most'
> > games where the
> > > position above appears, the marked intersections will
> > appear as territory
> > > (there will be either a live stone or an empty intersection
> > that counts as
> > > area):
> > >
> > > . . . . . .
> > > . X a a X .
> > > . a a a a .
> > > ._a_a_a_a_.
> > >
> > > To be more precise, there could be e.g. two games with the
> > position above
> > > and the final board position at the same point looking like
> > (`b' denotes
> > > black's area)
> > >
> > > X X O O O O
> > > b X X O X O
> > > X X b X X O
> > > b_b_X_X_O_O
> > >
> > > and
> > >
> > > O X b b b b
> > > X X b b X b
> > > b b b b b b
> > > b_b_b_b_b_b
> > >
> > > The pattern of `a's above would appear as an `average' of
> > these, but the
> > > definition of `average' of course depends on the
> > computational learning
> > > method used.
> > >
> > > A computational learning method could learn this. For
> > example, a neural
> > > network whose inputs are an N x N subgrid of the go board and whose
> > > outputs denote ownership of territory could do it, learning
> > from examples.
> > > Or so could do unsupervised vector quantization within a
> > suitable context.
> > >
> > > Regards,
> > >
> > > --
> > > Antti Huima
> > >
> > >
>
>
> _________________________________________________________
> Do You Yahoo!?
> Get your free @yahoo.com address at http://mail.yahoo.com
>
>