[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: computer-go: Temporal Difference Learning
> > there is an algorithm called TDLeaf, but I am not
> > convinced that it is useful.
>
> A quick web search found a paper by Baxter, Tridgell, and
> Weaver. Is this the canonical one?
yes.
> Also, can you say why you're not convinced this is
> useful?
it was used for training evaluation functions in chess that
used the material value of of the position as input.
Then you have the disadvantage that the material value
can change at every move in an exchange of pieces
which would give you horrible training patterns.
TDLeaf avoids this by using search to get more appropriate
target positions for training (e.g. after the exchange has
happened).
But you pay a very high price for it, because move
generation during self-play is now exponentially slower.
IMHO it would have been better to do a quiescence search for
determining the material value of a position used as input
for the evaluation function and choose the moves during
self-play by 1-ply look-ahead.
However I haven't performed any experiments and the neural
network in NeuroGo is much to slow to use TDLeaf.
> > NeuroGo in its most recent version uses local
> > connectivity and single-point eyes as additional
> > outputs that are trained with TD. I will present a
> > paper about this at ACG2003 which takes place together
> > with the Computer Olympiad in Graz/Austria in November.
>
> So when and how do those of us stuck stateside get ahold
> of it? :-)
I'll put the paper online when the final version is ready.
- Markus