[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: computer-go: Temporal Differences again
> I'm having real trouble with my temporal difference
> learning. As far as I can tell, the problem stems from
> the fact that, except at the end of the game, the
> reinforcement signal is simply the system's own estimate
> of the board value. This noise seems to overwhelm the
> real signal that appears at the end of the game.
do an online weight update after each position and go
backwards through the positions of a game. Many online
optimization processes are sensitive to the order of the
parameter update.
At the beginning the output of your function approximator
(neural network?) should be close to 0 in all positions
apart from near the end of game where the reinforcement
signal is.
If you have noise in earlier positions your training is
unstable (learning rate? weight initialization?)
Hope this helps
- Markus