[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
computer-go: Temporal Differences again
I'm having real trouble with my temporal difference learning. As far
as I can tell, the problem stems from the fact that, except at the end
of the game, the reinforcement signal is simply the system's own
estimate of the board value. This noise seems to overwhelm the real
signal that appears at the end of the game.
Cursory experiments indicate that it is better to play to the end of
the game, then go back and teach the system that this is the expected
result for each board position encountered along the way.
Thoughts?
Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/