[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Temporal Difference Learning



>  I don't think that it is a bad idea to use an evaluation 
>  function that was trained with TD and 1-ply look-ahead 
>  later in deeper searches. I am doing this with NeuroGo on 
>  the 9x9 board and doing a 3-ply full-board search makes it 
>  significantly stronger.

Yes, this should not be a bad thing.  Ideally, you would like to do TD
learning as deeply  as possible, but unfortunately it  is not possible
to go very deep without enormous training time.

What I did in  Chess was to do most of the  training at a very shallow
depth then  do a fraction of  the remaining training at  a deeper (but
still not very deep) depth.  We did  2 ply then followed up with 3 and
finally  4 ply  searches.   We did  quite  a bit  of  testing and  the
empirical  evidence  seemed  to  suggest  that  training  with  deeper
searches is  better if  you can afford  to do  it, but that  it wasn't
necessary for good  results.  Due to the ambiguious  nature of testing
chess strength, I  cannot say with certainty whether  this is actually
true.


-Don





   Date: Wed, 16 Jul 2003 11:42:13 -0600
   From: Markus Enzenberger <compgo@xxxxxxxxxxxxxxxxx>
   Content-type: text/plain; charset=iso-8859-1
   Content-disposition: inline
   User-Agent: KMail/1.5.1
   Sender: owner-computer-go@xxxxxxxxxxxxxxxxx
   Precedence: bulk
   Reply-To: computer-go@xxxxxxxxxxxxxxxxx

   > > IMHO it would have been better to do a quiescence
   > > search for determining the material value of a position
   > > used as input for the evaluation function and choose
   > > the moves during self-play by 1-ply look-ahead.
   >
   > I don't quite follow you.  If you're going to do a deep
   > search anyway, why not use the result to choose the best
   > move?

   not a full search, only the quiescence search, which is much 
   cheaper.

   In chess very special evaluation function are used.
   The evaluation is extremly fast and the material part is 
   instable in all positions, where capturing moves are 
   possible, but at the same time is by far the dominating 
   part of the evaluation. This is why you need quiescence 
   search and apply the evaluation only at stable nodes,
   which is more effective than trying to determine the 
   tactical stability of pieces statically.
   I agree that TDLeaf is useful to learn the value of pieces 
   with such an evaluation function.
   But the value of pieces is well known in chess and if you 
   want to learn the weights for the heuristic part only, it 
   might be more effective to replace the material part by 
   something more stable (like the result for the material 
   value of a quiescence search) and use 1-ply look-ahead.

   The situation is different in Go. The evaluation function 
   usually has to determine whether blocks are alive or dead 
   much earlier than when it can be determined by tactical 
   search.
   So evaluation functions in Go are usually slow, more stable 
   and dominated by the heuristic parts. They use results of 
   fast local tactical searches as input to the evaluation. I 
   doubt that TDLeaf is an improvement in this case.

   I don't think that it is a bad idea to use an evaluation 
   function that was trained with TD and 1-ply look-ahead 
   later in deeper searches. I am doing this with NeuroGo on 
   the 9x9 board and doing a 3-ply full-board search makes it 
   significantly stronger.

   - Markus