[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: computer-go: Board evaluation by counting...



> -----Original Message-----
> From: owner-computer-go@xxxxxxxxxxxxxxxxx
> [mailto:owner-computer-go@xxxxxxxxxxxxxxxxx]On Behalf Of Mark Boon
> Sent: 20 September 2001 13:42
> To: computer-go@xxxxxxxxxxxxxxxxx
> Subject: RE: computer-go: Board evaluation by counting...
>
>
>
>
> > -----Original Message-----
> > From: owner-computer-go@xxxxxxxxxxxxxxxxx
> > [mailto:owner-computer-go@xxxxxxxxxxxxxxxxx]On Behalf Of Julian
> > Churchill
> > Sent: Wednesday, September 19, 2001 7:49 PM
> > To: computer-go@xxxxxxxxxxxxxxxxx
> > Subject: RE: computer-go: Board evaluation by counting...
> >
> >
> >
> >  I'm doing something along those lines. I have a net that
> takes 9x9 areas
> > centred on a potential move and then outputs a value
> indicating how 'good'
> > the move is. The training data is taken from pro tournament games
> > and seems
> > to produce a network that is fairly generalised but useful
> if used with
> > other things such as alpha-beta. I'm hoping to add some
> more networks that
> > specialise in the opening game and perhaps corner and edge
> moves to get
> > around the overgeneralisation that the current network suffers.
> >
>
> That sounds more like the NN approach discussed here before.
> I have severe
> doubts this will actually work.
>

 Well it can only be tried, if everyone just said that's not going to work
without attempting to experiment with things a bit and have a go at
contributing some useful experiences to a topic then we wouldn't get
anywhere. Even negative experiences are useful since they rule out methods
to try and often suggest a good deal of alternative approaches that could
yield successful results.


> >  Apart from this if you're going to use a neural net to estimate
> > territory,
> > in other words to be used as an evaluation function, why
> not use temporal
> > difference learning methods that worked so well for
> backgammon? I think it
> > Nicol Schraudolph wrote a paper or two on this and I seem
> to remember the
> > results were very promising.
> >
>
> It is not at all an evaluation function, but an influence function.
> Influence is just a small component of an evaluation function in Go.
>

 But surely an influence function can be used to estimate territory and
hence be a good measure of the relative position of the players at that
stage, exactly the criteria for an evaluation function? Influence may be
just a part of a more complex evaluation function but I see no reason why a
neural network that was trained to be used as an evaluation function,
perhaps using temporal difference methods, would not be able to assume some
of the other components of a more complex eval. function. As I said before,
we don't know what is possible until it is attempted and the neural network
area is a relatively new and rapidly expanding subject where new methods are
being discovered/invented all the time.

> What is the similarity between Go and Backgammon? Is there
> any at all that
> would suggest that which works for one is going to be
> remotely useful for
> the other?
>
> In general I don't believe in the 'magical' approaches where
> one single
> solution will give a Go playing program. Go is too complex.
> Feeding pro
> games to a learning algorithm and hope it will learn to play
> Go is like
> feeding Shakespeare to a monkey and hope it will learn English. (The
> comparison is actually an insult to the monkey, which is
> capable of learning
> a great deal more than any known learning algorithm.)
>

 I'm not suggesting an all-powerful network should be created, that is quite
clearly a highly implausible solution. Even so I think you underestimate the
possiblities that learning methods present, after all humans aren't
programmed with hard-coded rules before they can play a game, they learn the
rules and benefit from experience over a lifetime. If a human was only given
the rules but never allowed to correct their mistakes from game to game, or
even from move to move, no improvement would ever be made, so I firmly
believe learning methods have a very important role to play in all AI
applications not just game playing.

What may be possible is to use neural networks in conjunction with other
methods to gain the advantages of a program that can learn. As I see it one
of the main problems with Go programs at the moment is that after a player
has had a few games the weaknesses and strengths of the program are readily
identifiable. If the program had a learning element it would be able to
adapt to cope with a human trying to exploit it's weaknesses just as any
human would quickly learn to do.

 Cheers,
 Julian Churchill

> > > -----Original Message-----
> > > From: owner-computer-go@xxxxxxxxxxxxxxxxx
> > > [mailto:owner-computer-go@xxxxxxxxxxxxxxxxx]On Behalf Of Mark Boon
> > > Sent: 19 September 2001 12:53
> > > To: computer-go@xxxxxxxxxxxxxxxxx
> > > Subject: RE: computer-go: Board evaluation by counting...
> > >
> > >
> > > Your proposition sounds like the one I made a while ago when
> > > people were
> > > discussing using Neural Nets for Go. However, your message
> > > rather suggests
> > > this is a tried and tested method, which as far as I know
> > > it's not. I think
> > > this would be an interesting experiment (although I would
> > > expect that such a
> > > NN would be too slow for practical use at this time) that
> would give a
> > > better idea about how NNs could be applied usefully to
> computer-go.
> > >
> > > To suggest this as an alternative to the poor guy who's
> > > trying to figure out
> > > a way to design a simple influence algorithm is a bit crass.
> > >
> > >     Mark Boon
> > >
> > > >
> > > > Sounds like an ad hoc method. Why do you think it will give
> > > a better way
> > > > to calculate territory? Why you think it will be more
> accurate? Why?
> > > > Asking these questions is definitely reasonable
> scientific practice.
> > > >
> > > > I think that methods for estimating territory (i.e. methods
> > > for evaluating
> > > > a board position) must be derived more from the game itself
> > > and less from
> > > > the ease of implementing of or the smallness of an algorithm.
> > > >
> > > > One possibility would be this: Choose a big set of
> > > dan-level games that
> > > > have played until end. Choose a game G. Guess where the
> > > territories were
> > > > at the end. This can be done heuristically with good
> > > precision. It doesn't
> > > > matter if the judgements go wrong at times as long as most
> > > of the time the
> > > > territories are guessed correctly.
> > > >
> > > > Then walk backwards in the game G. Correlate by a
> > > computational learning
> > > > method the patterns of stones in the positions of G with
> > > the known sets of
> > > > final territories. In this way you can train a learning
> > > system to predict
> > > > where territories will appear, given a position that is not
> > > final. Of
> > > > course, the system will make errors because it doesn't
> > > understand the
> > > > full-board tactical aspects of go, but it should work
> > > better (and consume
> > > > more resources) than a basic influence-based method.
> > > >
> > > > To illustrate, we `know' based on our experience that
> > > >
> > > >   . . . . . .
> > > >   . X . . X .
> > > >   . . . . . .
> > > >   ._._._._._.
> > > >
> > > > is a relative stable formation. This means that in `most'
> > > games where the
> > > > position above appears, the marked intersections will
> > > appear as territory
> > > > (there will be either a live stone or an empty intersection
> > > that counts as
> > > > area):
> > > >
> > > >   . . . . . .
> > > >   . X a a X .
> > > >   . a a a a .
> > > >   ._a_a_a_a_.
> > > >
> > > > To be more precise, there could be e.g. two games with the
> > > position above
> > > > and the final board position at the same point looking like
> > > (`b' denotes
> > > > black's area)
> > > >
> > > >   X X O O O O
> > > >   b X X O X O
> > > >   X X b X X O
> > > >   b_b_X_X_O_O
> > > >
> > > > and
> > > >
> > > >   O X b b b b
> > > >   X X b b X b
> > > >   b b b b b b
> > > >   b_b_b_b_b_b
> > > >
> > > > The pattern of `a's above would appear as an `average' of
> > > these, but the
> > > > definition of `average' of course depends on the
> > > computational learning
> > > > method used.
> > > >
> > > > A computational learning method could learn this. For
> > > example, a neural
> > > > network whose inputs are an N x N subgrid of the go
> board and whose
> > > > outputs denote ownership of territory could do it, learning
> > > from examples.
> > > > Or so could do unsupervised vector quantization within a
> > > suitable context.
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > Antti Huima
> > > >
> > > >
> >
> >
> > _________________________________________________________
> > Do You Yahoo!?
> > Get your free @yahoo.com address at http://mail.yahoo.com
> >
> >


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com