[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AI Methods in Go



Dear All,

I would like to make a few points concerning the recent discussion.

Some people are [again] applying Temporal Difference Learning --- more
generally unsupervised reinforcement learning --- and function
approximation methods, i.e. neural network methods, to learning go. It has
been noted in the literature that often appropriate preprocessing of data
is necessary for an NN method to work well. I think this is very true in
go. On the surface, go positions are so chaotic that a raw neural net
trained via TDL is not expected to learn good go strategies in reasonable
time.

Let us look on some of the results that might encourage the usage of raw
TDL/NNs in order to learn go algorithmically.

1. TD(lambda) works well with Backgammon, as demonstrated by Thesauro's
program. However, the number of possible states in Backgammon is less than

  10^24,

if I remember the number of distinct positions correctly (24?). Compare
this with 9x9 go's [similar raw upper bound]

  3^81.

The ratio is around 10^10. In other ways also, Backgammon is much easier
problem for a NN than go. For example, the intrinsic stochastic nature of
Backgammon makes it a suitable target for a learning algorithm as the die
rolls "smooth" the game and make it more stochastic, and NNs are good in
approximating stochastic processes. In contrast to this, go positions are
more complex and the smoothing stochastic element is missing.

2. Logistello, a top Othello program, learns its evaluation function in an
unsupervised mode, i.e. using TDL or some similar method. However, the
structure of the evaluation function has been predetermined. That is, a
priori knowledge is used to reduce the learning problem. The results are
good --- Logistello was the top Othello playing program --- but in order
to reproduce them in the game of go, one should first decompose the
evaluation function into components that could be learned more rapidly.

* The point is, I conjecture that learning algorithms need a very
carefully preprocessed input in order to learn to play go well. *

I could suggest some methods that I like intuitively.

1. Partition the position space of go into clusters that contain similar
positions, and then approximate the "good" play in every cluster
individually. This probably requires supervised learning, i.e. expert game
records to work. The benefit is that [assumedly] the playing function in
each cluster is more smooth and regular than in the whole position space.

2. Use a `mixture of experts': create many function approximations, i.e.
train many NNs, and then add a new network that learns to choose for
individual positions the network that suggests the moves with greates
confidence.

3. Divide the problem a priori into different layers. As a simple example,
train one network to understand global positions and the major concepts of
influence, and another to detect good shape. Then combine the networks
e.g. using Bayesian methods. To learn influence-based moves, one can
"smooth" the individual board positions by using e.g. convolution matrix
based "blurring", i.e. kernel functions. This can make the evaluation of
"similarity" of different board positions easier.

Finally, as a comment to David's work: I'm sure that it is an interesting
intellectual adventure to code go knowledge as explicit rules by hand.
Nevertheless, I suspect that You will encounter problems when the rules
start to interact and you need priorities. The priorities of the
well-defined rules typically are not orderable, but depend on the overall
situation. That is, the priorities are non-linear functions of many
variables. It seems plausible that in order to get best results, a
function approximation method must be eventually used to weight the
different rules. This is of course no problem, but an enhancement. That
being said, I think that winning MFOG at least once [as David mentioned]
is a good start :-)

-- 
Antti Huima
SSH Communications Security Oy