[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Temporal Difference Learning



On Tuesday, July 15, 2003, at 11:24  AM, Markus Enzenberger wrote:

it was used for training evaluation functions in chess that
used the material value of of the position as input.
Then you have the disadvantage that the material value
can change at every move in an exchange of pieces
which would give you horrible training patterns.
TDLeaf avoids this by using search to get more appropriate
target positions for training (e.g. after the exchange has
happened).
Ah ... explains why it helps.

But you pay a very high price for it, because move
generation during self-play is now exponentially slower.
That's for sure!

IMHO it would have been better to do a quiescence search for
determining the material value of a position used as input
for the evaluation function and choose the moves during
self-play by 1-ply look-ahead.
I don't quite follow you. If you're going to do a deep search anyway, why not use the result to choose the best move?

Another possibility is to train both a move suggestor and a board evaluator at the same time. Once the move suggestor has some idea what it's doing, the program can use a narrower, deeper search.

Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/