[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Temporal Difference Learning

To: computer-go@xxxxxxxxxxxxxxxxx
Subject: Re: computer-go: Temporal Difference Learning
From: Markus Enzenberger <compgo@xxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 16 Jul 2003 11:42:13 -0600
In-reply-to: <98C75E8C-B705-11D7-8A74-0003937E1CFC@xxxxxxxxxxxxxxxxx>
References: <98C75E8C-B705-11D7-8A74-0003937E1CFC@xxxxxxxxxxxxxxxxx>
Reply-to: computer-go@xxxxxxxxxxxxxxxxx
Sender: owner-computer-go@xxxxxxxxxxxxxxxxx
User-agent: KMail/1.5.1

> > IMHO it would have been better to do a quiescence
> > search for determining the material value of a position
> > used as input for the evaluation function and choose
> > the moves during self-play by 1-ply look-ahead.
>
> I don't quite follow you.  If you're going to do a deep
> search anyway, why not use the result to choose the best
> move?

not a full search, only the quiescence search, which is much 
cheaper.

In chess very special evaluation function are used.
The evaluation is extremly fast and the material part is 
instable in all positions, where capturing moves are 
possible, but at the same time is by far the dominating 
part of the evaluation. This is why you need quiescence 
search and apply the evaluation only at stable nodes,
which is more effective than trying to determine the 
tactical stability of pieces statically.
I agree that TDLeaf is useful to learn the value of pieces 
with such an evaluation function.
But the value of pieces is well known in chess and if you 
want to learn the weights for the heuristic part only, it 
might be more effective to replace the material part by 
something more stable (like the result for the material 
value of a quiescence search) and use 1-ply look-ahead.

The situation is different in Go. The evaluation function 
usually has to determine whether blocks are alive or dead 
much earlier than when it can be determined by tactical 
search.
So evaluation functions in Go are usually slow, more stable 
and dominated by the heuristic parts. They use results of 
fast local tactical searches as input to the evaluation. I 
doubt that TDLeaf is an improvement in this case.

I don't think that it is a bad idea to use an evaluation 
function that was trained with TD and 1-ply look-ahead 
later in deeper searches. I am doing this with NeuroGo on 
the 9x9 board and doing a 3-ply full-board search makes it 
significantly stronger.

- Markus

Follow-Ups:
- Re: computer-go: Temporal Difference Learning
  - From: Don Dailey
- computer-go: Quiescence (was Re: Temporal Difference Learning)
  - From: Peter Drake

References:
- Re: computer-go: Temporal Difference Learning
  - From: Peter Drake

Prev by Date: Re: computer-go: Temporal Difference Learning
Next by Date: computer-go: Quiescence (was Re: Temporal Difference Learning)
Previous by thread: Re: computer-go: Temporal Difference Learning
Next by thread: computer-go: Quiescence (was Re: Temporal Difference Learning)
Index(es):
- Date
- Thread