[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
computer-go: Re Ideas - TD Learning
Heikki Levanto wrote:
> Xu, Mousheng <moushengxu@xxxxxxxxxxxxxxxxx> wrote:
>
> > Could you educate us a bit on "TD-learning"? Is it Traing with
> > Dataset? (I am embarassing myself)
>
> Temporal Difference Learning. There is a good paper on it somewhere on the
> net, I found it easily last time I had to reload it by searching
> (altavista?) for Temporal Difference Learing. Quite interesting stuff.
I have a paper:
Schraudolf, Dayan & Sejnowski
"Temporal Difference Learning of Position Evaluation in the Game of Go"
Pre-print - to appear in Cowan, Tesauro and Alspector (eds)
Advances in Neural Information Processing 6, Morgan Kaufmann, San
Francisco 1994
There is also an on-line text book on Reinforcement Learning (which is,
or is
related to, TD Learning) by Richard Sutton
www.cs.umass.edu/~rich
> > Before you go too far too big, just try a little thing -- teach
> > it to learn how to play joseki.
>
> In my opinion, playing joseki is a far cry from a "little thing". My first
> goal will be to make the program learn to capture juts one stone, then
> multiple stones, then get an idea of when to pass, then play a whole game.
With TD Learning I think you should start at the end of the game. My
understanding of TD Learning is that it works using backed up values, so
if you
start learning when the end of the game is in sight then its learning
will be
accurate. As the software learns to evaluate the end game properly you
can start
to present it with positions earlier in the game.
I used a TD Learning like approach to learn the game of Nine Mens Morris
when I
was a research student 25 years ago. It worked suprisingly well.
> Naturally I will start on a 9x9 board, if not smaller, although my plan may
> scale better to larger boards, as I do not feed the board image to the net
> at all.
With my Nine Mens Morris Program I started by giving the program small
patterns
of pieces on the board and letting the learning algorithm work out which
ones
were important. Important patterns were those which strongly predicted a
winning
or losing position while unimportant patterns did not predict strongly
and so
could be dropped to allow new, more complex patterns to be tried out.
I think TD Learning is a very good approach to learning Go. I am trying
to get a
project going here in Budapest with Dr Andras Lorincz at the Eotvos
Lorand
University. Andras is very keen to see how TD Learning copes with Go. We
are
intending to get some of Andras' students to do the work, but so far
there have
been no takers.
Cheers
David