[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: computer-go: Temporal Difference Learning
> I don't think that it is a bad idea to use an evaluation
> function that was trained with TD and 1-ply look-ahead
> later in deeper searches. I am doing this with NeuroGo on
> the 9x9 board and doing a 3-ply full-board search makes it
> significantly stronger.
Yes, this should not be a bad thing. Ideally, you would like to do TD
learning as deeply as possible, but unfortunately it is not possible
to go very deep without enormous training time.
What I did in Chess was to do most of the training at a very shallow
depth then do a fraction of the remaining training at a deeper (but
still not very deep) depth. We did 2 ply then followed up with 3 and
finally 4 ply searches. We did quite a bit of testing and the
empirical evidence seemed to suggest that training with deeper
searches is better if you can afford to do it, but that it wasn't
necessary for good results. Due to the ambiguious nature of testing
chess strength, I cannot say with certainty whether this is actually
true.
-Don
Date: Wed, 16 Jul 2003 11:42:13 -0600
From: Markus Enzenberger <compgo@xxxxxxxxxxxxxxxxx>
Content-type: text/plain; charset=iso-8859-1
Content-disposition: inline
User-Agent: KMail/1.5.1
Sender: owner-computer-go@xxxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: computer-go@xxxxxxxxxxxxxxxxx
> > IMHO it would have been better to do a quiescence
> > search for determining the material value of a position
> > used as input for the evaluation function and choose
> > the moves during self-play by 1-ply look-ahead.
>
> I don't quite follow you. If you're going to do a deep
> search anyway, why not use the result to choose the best
> move?
not a full search, only the quiescence search, which is much
cheaper.
In chess very special evaluation function are used.
The evaluation is extremly fast and the material part is
instable in all positions, where capturing moves are
possible, but at the same time is by far the dominating
part of the evaluation. This is why you need quiescence
search and apply the evaluation only at stable nodes,
which is more effective than trying to determine the
tactical stability of pieces statically.
I agree that TDLeaf is useful to learn the value of pieces
with such an evaluation function.
But the value of pieces is well known in chess and if you
want to learn the weights for the heuristic part only, it
might be more effective to replace the material part by
something more stable (like the result for the material
value of a quiescence search) and use 1-ply look-ahead.
The situation is different in Go. The evaluation function
usually has to determine whether blocks are alive or dead
much earlier than when it can be determined by tactical
search.
So evaluation functions in Go are usually slow, more stable
and dominated by the heuristic parts. They use results of
fast local tactical searches as input to the evaluation. I
doubt that TDLeaf is an improvement in this case.
I don't think that it is a bad idea to use an evaluation
function that was trained with TD and 1-ply look-ahead
later in deeper searches. I am doing this with NeuroGo on
the 9x9 board and doing a 3-ply full-board search makes it
significantly stronger.
- Markus