[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: some ideas
Heikki Levanto writes:
> Henrik Rydberg (rydberg@xxxxxxxxxxxxxxxxx) wrote in lsd.compgo:
>
> : As opposed to for instance Backgammon [1], Go is strictly
> : deterministic, leading to some problems when applying
> : algorithms such as temporal difference learning (TD).
>
> I think the problem with go is not so much the determinism, but the
> difficulty in evaluating positions.
In application to chess TD learning variant called TD-Leaf works
nicely, and this specific example supports Heikki's view. I believe
that success of TD learning has nothing, or very little, to do with
the game being deterministic.
Perhaps Heikki is right and the problem is in the very details of the
game. In Go humans have highly developed expert knowledge of specific
patterns, etc., and this is hard to handle with inexact (and
inefficient) artificial neural networks of today. Also, one should not
consider TD learning as restricted to just neural networks -- it is a
general method that can be applied in a large class of learning (or
minimization) paradigms.
I suppose, that TD learning is useful for learning in Go programs, as
it has been useful in other deterministic games. Even hand-tuned
programs might benefit from using TD learning. For example, assume
that you have a Go program with many hand-tuned weights. They could be
for anything, e.g. patterns. Without a doubt at some point of the
development you will (I'm guessing here!) find that the program is
slightly inconsistent, or that finding suitable changes to parameters
is very difficult. In such a situation using TD learning might allow
automatic tuning which finds good relative correction to the
weights. To me it seems that this is what can be learned from
TD-Gammon etc.; even if you can find the optimal weights by
hand it is much easier to use an automated method. Also often
automated weight optimization will be less biased as that of the
developer.
Of course, given the current level of Go programs and, in particular,
the level of the programmers (it seems that you all are dan level, or
close, Go players!) TD methods might be irrelevant.
I think that the problem with neural networks is that they are assumed
to solve all problems in a Go program without any expert
knowledge. However, research into this has been made (e.g. by
Schraudolph, Dayan and Sejnowski and by Enzenberger) and with
relatively good results. These results suggest that expert knowledge is
necessary, and addition of even modest string structures lead to
relatively good programs.
IMHO, it doesn't seem likely that neural network based Go programs
become any threat to hand-tuned programs in the near future. They
demand a lot of computation, and learning times seem
intolerable. Nevertheless, one cannot help but wonder whether there
exists some representation of a Go state, or a new neural
architecture, that allows this to happen.
TD learning is described in the paper by Richard Sutton
Richard S. Sutton, Learning to Predict by the Methods of Temporal
Differences. Machine Learning 3: 9 - 44, 1988.
of which a reprint can be found on his web page.
Mika Kojo
SSH Communications Security, Ltd.