[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Temporal Difference Learning



>
> > Also, can you say why you're not convinced this is
> > useful?
>
> it was used for training evaluation functions in chess that
> used the material value of of the position as input.
> Then you have the disadvantage that the material value
> can change at every move in an exchange of pieces
> which would give you horrible training patterns.
> TDLeaf avoids this by using search to get more appropriate
> target positions for training (e.g. after the exchange has
> happened).
> But you pay a very high price for it, because move
> generation during self-play is now exponentially slower.

well, it is slower, but this is not only for learning purposes.

After all, doing lookahead lets your program play better moves,
So perhaps you want to use your learned evaluation function in a program
that does lookahead.

And if you plan to do lookahead anyway during actual play, it is better to
train using the same lookahead technique so your examples are
representative (at least more representative than they would be if you
used play without lookahead to generate training examples).

jan