[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Temporal Difference Learning



Markus,

I  don't  understand  why   you  say  TDLeaf  is  exponentially  slow.
Generating a principal variation from a search is almost free and that
is what  you train against, the  final node position  of the principal
variation.  Are you  comparing to something that doesn't  need to do a
search such as getting positions from game records?

Me and  Don Beal did something like  this also with chess.   We took a
fairly  complicated evaluation  function and  tuned the  weights using
TDLeaf.   We  actually did  4  ply  searches  and played  hundreds  of
thousands of games over a several  week time period.  To save time, we
pre-tuned the  weights with 2 ply  games until they  got farily stable
and then went from there with the 4 ply games.  I also did some things
to optimize the  search time of very shallow  searches to speed things
up.

>From  time to  time we  would play  some matches  with the  hand tuned
version of the  program and we watched it improve  over the weeks.  We
were  fairly surprised  by  the results.   When  we stopped  we had  a
program that could  beat our standard program almost  70% of the time.
When we looked  at the weights the program chose,  many of them seemed
odd, but the program was indeed better.  The best surprise was that it
played much more interesting chess.

I think one  advantage of this kind of thing is  that the algorithm is
immune from the fears and predjudices that a human will impose upon it
when engineering the  weights manually.  In our case,  the program was
not  afraid  to play  what  seemed like  much  riskier  moves such  as
sacrafices,  moves it  would never  have tried  before.  But  this new
style from the point of view of the algorithm wasn't risky, it was the
surest path to sucess as measured by TDLeaf.

One very desirable characteristic of  the new evaluation weights was a
de-emphasis  on material  values.  It  seems  that the  values of  the
pieces had  more to do with  their total positional value  and less on
the static fixed values that we usually assign to pieces.

It is very  hard to claim success despite what  I just related because
it is not clear how good  the initial hand tuned weight actually were.
I can only say  I really liked the way it played  and that this seemed
to be a better way to choose  weights that what I was capable of doing
on my own.

Unfortunately, the  program was in heavy development  during the weeks
it was  being tuned by  TDLeaf.  The evaluation  changed significantly
and  the new  weights were  out  of date.   We never  actually got  to
benefit from  the technique since  we did not  have the time  to start
over.



Don






   Date: Tue, 15 Jul 2003 12:24:19 -0600
   From: Markus Enzenberger <compgo@xxxxxxxxxxxxxxxxx>
   Content-type: text/plain; charset=iso-8859-1
   Content-disposition: inline
   User-Agent: KMail/1.5.1
   Sender: owner-computer-go@xxxxxxxxxxxxxxxxx
   Precedence: bulk
   Reply-To: computer-go@xxxxxxxxxxxxxxxxx

   > > there is an algorithm called TDLeaf, but I am not
   > > convinced that it is useful.
   >
   > A quick web search found a paper by Baxter, Tridgell, and
   > Weaver.  Is this the canonical one?

   yes.

   > Also, can you say why you're not convinced this is
   > useful?

   it was used for training evaluation functions in chess that
   used the material value of of the position as input.
   Then you have the disadvantage that the material value
   can change at every move in an exchange of pieces
   which would give you horrible training patterns.
   TDLeaf avoids this by using search to get more appropriate 
   target positions for training (e.g. after the exchange has 
   happened).
   But you pay a very high price for it, because move 
   generation during self-play is now exponentially slower.
   IMHO it would have been better to do a quiescence search for 
   determining the material value of a position used as input 
   for the evaluation function and choose the moves during 
   self-play by 1-ply look-ahead.
   However I haven't performed any experiments and the neural 
   network in NeuroGo is much to slow to use TDLeaf.

   > > NeuroGo in its most recent version uses local
   > > connectivity and single-point eyes as additional
   > > outputs that are trained with TD. I will present a
   > > paper about this at ACG2003 which takes place together
   > > with the Computer Olympiad in Graz/Austria in November.
   >
   > So when and how do those of us stuck stateside get ahold
   > of it?  :-)

   I'll put the paper online when the final version is ready.

   - Markus