[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: some ideas



Peter Hallberg (hallberg@xxxxxxxxxxxxxxxxx) wrote in lsd.compgo:

: In training neural networks it is essential that most of the game space is
: represented in neurons. When you train a Backgammon neural network the
: dice give some kind of stochastic variation to the game. You reach a lot
: of positions you would never have encountered if the dice always rolled
: the most likely "throw". This causes the neural network to learn a lot of
: different game plans etc.

I still don't see why we can not add some randomness in a good network. If
in normal mode the nt proposes a number of moves, and chooses the best, in
more randomized mode it could choose randomly, in proportion to the value of
the suggested move.  Alternatively, it could play a random move one time out
of many, just to get to explore unknown areas. All this, of course, at the
training stage - in a tournament situation you might wish to suppress such
random behaviour.

: In Go you can use the idea of min-max (supposing the opponent makes the best
: move). Then the games of Go (and Chess) follow some paths (openings and
: strategies) which are accepted as good. A neural network trained this way would
: (I think) have difficulty playing against a less accepted strategy. As it has
: no point of reference.

This happes in human players too. I think it was Kageyama in his "Lessons in
the Fundamentals of Go" who complained of people who can play josekis
perfectly, but get confused the moment the opponent deviates from the
sequence. His point was to understand what you were doing, and not to
imitate blindly what you find in the library.

Of course this depends heavily on the training method. If we feed the NN (or
whatever learning machine) with master games, we are likely to see it
repeating things in wrong context. If it learns from playing against itself
(like TD-gammon), it will be more likely to play moves it can follow up.
Learning from games it plays against human (or a variety of other programs)
ought to be somewhere in between...


: > In Backgammon both players move closer to their goals on every move [..]

: Given that the game of Backgammon is by magnitudes simpler than Go, it is not
: THAT simple. Knowing to play a "back game" or "prime vs. prime" (two quite
: difficult strategies) is NOT doing simple estimations but complex evaluations
: like the ones in Go (looking at the board and checkers as one).

I freely admit that my knowledge of Backgammon is quite limited. I did not
wish to imply that Backgammon was anywhere near trivial (like "Ludo" or
tic-tac-toe).  My point was that in Backgammon there at least exists a
decent first-order approximation of an evaluation function, whereas in Go
one can not even start to evaluate the situation without some idea of which
groups are alive or dead (or somewhere in between).



: So, I think that TD is doing so well in Backgammon because of the stochastic
: nature (the dice) of the game.

So, I think not.  In the Sutton & Barto book
(http://www-anw.cs.umass.edu/~rich/book/the-book) there are many examples of
TD in various ways. Most of them are quite deterministic, and the
algorithm(s) perform quite well, thank you.


- Heikki

P.S. How would you attack the problem of programming go?


--
Heikki Levanto  LSD - Levanto Software Development   <heikki@xxxxxxxxxxxxxxxxx>