[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: computer-go: Temporal Difference Learning
> > IMHO it would have been better to do a quiescence
> > search for determining the material value of a position
> > used as input for the evaluation function and choose
> > the moves during self-play by 1-ply look-ahead.
>
> I don't quite follow you. If you're going to do a deep
> search anyway, why not use the result to choose the best
> move?
not a full search, only the quiescence search, which is much
cheaper.
In chess very special evaluation function are used.
The evaluation is extremly fast and the material part is
instable in all positions, where capturing moves are
possible, but at the same time is by far the dominating
part of the evaluation. This is why you need quiescence
search and apply the evaluation only at stable nodes,
which is more effective than trying to determine the
tactical stability of pieces statically.
I agree that TDLeaf is useful to learn the value of pieces
with such an evaluation function.
But the value of pieces is well known in chess and if you
want to learn the weights for the heuristic part only, it
might be more effective to replace the material part by
something more stable (like the result for the material
value of a quiescence search) and use 1-ply look-ahead.
The situation is different in Go. The evaluation function
usually has to determine whether blocks are alive or dead
much earlier than when it can be determined by tactical
search.
So evaluation functions in Go are usually slow, more stable
and dominated by the heuristic parts. They use results of
fast local tactical searches as input to the evaluation. I
doubt that TDLeaf is an improvement in this case.
I don't think that it is a bad idea to use an evaluation
function that was trained with TD and 1-ply look-ahead
later in deeper searches. I am doing this with NeuroGo on
the 9x9 board and doing a 3-ply full-board search makes it
significantly stronger.
- Markus