[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Temporal Differences again



> I'm having real trouble with my temporal difference
> learning.  As far as I can tell, the problem stems from
> the fact that, except at the end of the game, the
> reinforcement signal is simply the system's own estimate
> of the board value.  This noise seems to overwhelm the
> real signal that appears at the end of the game.

do an online weight update after each position and go 
backwards through the positions of a game. Many online 
optimization processes are sensitive to the order of the 
parameter update.

At the beginning the output of your function approximator 
(neural network?) should be close to 0 in all positions 
apart from near the end of game where the reinforcement 
signal is.

If you have noise in earlier positions your training is 
unstable (learning rate? weight initialization?)

Hope this helps

- Markus