[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Re Ideas - TD Learning



Ahh, help by students !  I've always thought that was a great idea.
Please, keep me posted if there is any progress in this area.
        Gary
----------
-----Original Message-----
From: Patricia Hughes and David Elsdon <babel17@xxxxxxxxxxxxxxxxx>
To: computer-go@xxxxxxxxxxxxxxxxx <computer-go@xxxxxxxxxxxxxxxxx>
Date: Friday, November 12, 1999 10:03 PM
Subject: computer-go: Re Ideas - TD Learning


>Heikki Levanto wrote:
>
>> Xu, Mousheng <moushengxu@xxxxxxxxxxxxxxxxx> wrote:
>>
>> >       Could you educate us a bit on "TD-learning"? Is it Traing with
>> > Dataset? (I am embarassing myself)
>>
>> Temporal Difference Learning. There is a good paper on it somewhere on
the
>> net, I found it easily last time I had to reload it by searching
>> (altavista?) for Temporal Difference Learing. Quite interesting stuff.
>
>I have a paper:
>
>Schraudolf, Dayan & Sejnowski
>"Temporal Difference Learning of Position Evaluation in the Game of Go"
>Pre-print - to appear in Cowan, Tesauro and Alspector (eds)
>Advances in Neural Information Processing 6, Morgan Kaufmann, San
>Francisco 1994
>
>There is also an on-line text book on Reinforcement Learning (which is,
>or is
>related to, TD Learning) by Richard Sutton
>
>www.cs.umass.edu/~rich
>
>> >       Before you go too far too big, just try a little thing -- teach
>> > it to learn how to play joseki.
>>
>> In my opinion, playing joseki is a far cry from a "little thing". My
first
>> goal will be to make the program learn to capture juts one stone, then
>> multiple stones, then get an idea of when to pass, then play a whole
game.
>
>With TD Learning I think you should start at the end of the game. My
>understanding of TD Learning is that it works using backed up values, so
>if you
>start learning when the end of the game is in sight then its learning
>will be
>accurate. As the software learns to evaluate the end game properly you
>can start
>to present it with positions earlier in the game.
>
>I used a TD Learning like approach to learn the game of Nine Mens Morris
>when I
>was a research student 25 years ago. It worked suprisingly well.
>
>> Naturally I will start on a 9x9 board, if not smaller, although my plan
may
>> scale better to larger boards, as I do not feed the board image to the
net
>> at all.
>
>With my Nine Mens Morris Program I started by giving the program small
>patterns
>of pieces on the board and letting the learning algorithm work out which
>ones
>were important. Important patterns were those which strongly predicted a
>winning
>or losing position while unimportant patterns did not predict strongly
>and so
>could be dropped to allow new, more complex patterns to be tried out.
>
>I think TD Learning is a very good approach to learning Go. I am trying
>to get a
>project going here in Budapest with Dr Andras Lorincz at the Eotvos
>Lorand
>University. Andras is very keen to see how TD Learning copes with Go. We
>are
>intending to get some of Andras' students to do the work, but so far
>there have
>been no takers.
>
>Cheers
>
>David
>