[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

TD(-leaf) for GO



What I'd like to see with GO is mimicing the stochastic nature of Backgammon
for the purposes of training the network using TD. I.e. choose randomly
from amongst a set of reasonable moves (say the top 10 moves of a program 
like HandTalk or Many Faces) as the game went along, or varying with
different opponents between games. Naturally this is not the way Backgammon
nets like Snowie or JellyFish or TD-Gammon were trained (I don't think)
but at least the state space would be enlarged compartively. I.e. don't
search down the same rut pathways.

This should let the net explore a larger search space than the usual "best of
N" selection method. Also it would result in biasing the network much less
towards being simply anti (or pro) that program's style. The patterns trained
should be better, more useful. Possibly could be shared with GO experts after
a (long) training to get further refinement.

If necessary, take a bunch of automated opponents and better a bunch of humans
say on FIGS, and train the net against 'em.  Choose a bunch of generally
useful
Go features for training. If necessary, talk with this list, Fotland, Zhixng 
for feature ideas. Use a full 19x19 board. Let it play 7x24 for months.
Compare the resulting net with hand-tuned programs like Many Faces and
HandTalk as well as with the starting program.

KnightCap's (and predecessor's) TD-leaf for Chess resulted in something like 
a 300 point jump in 3 days of playing on the chess server against mostly
humans.

--Stuart