[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Temporal Differences again

To: computer-go@xxxxxxxxxxxxxxxxx
Subject: Re: computer-go: Temporal Differences again
From: Markus Enzenberger <compgo@xxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 12 Aug 2003 21:33:26 -0600
In-reply-to: <DD82EB6A-CD19-11D7-933F-0003937E1CFC@xxxxxxxxxxxxxxxxx>
References: <DD82EB6A-CD19-11D7-933F-0003937E1CFC@xxxxxxxxxxxxxxxxx>
Reply-to: computer-go@xxxxxxxxxxxxxxxxx
Sender: owner-computer-go@xxxxxxxxxxxxxxxxx
User-agent: KMail/1.5.1

> I'm having real trouble with my temporal difference
> learning.  As far as I can tell, the problem stems from
> the fact that, except at the end of the game, the
> reinforcement signal is simply the system's own estimate
> of the board value.  This noise seems to overwhelm the
> real signal that appears at the end of the game.

do an online weight update after each position and go 
backwards through the positions of a game. Many online 
optimization processes are sensitive to the order of the 
parameter update.

At the beginning the output of your function approximator 
(neural network?) should be close to 0 in all positions 
apart from near the end of game where the reinforcement 
signal is.

If you have noise in earlier positions your training is 
unstable (learning rate? weight initialization?)

Hope this helps

- Markus

References:
- computer-go: Temporal Differences again
  - From: Peter Drake

Prev by Date: Re: computer-go: Temporal Differences again
Next by Date: computer-go: Results on self play vs recorded games
Previous by thread: Re: computer-go: Temporal Differences again
Next by thread: computer-go: Results on self play vs recorded games
Index(es):
- Date
- Thread