[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: computer-go: Temporal Difference Learning
Markus,
I don't understand why you say TDLeaf is exponentially slow.
Generating a principal variation from a search is almost free and that
is what you train against, the final node position of the principal
variation. Are you comparing to something that doesn't need to do a
search such as getting positions from game records?
Me and Don Beal did something like this also with chess. We took a
fairly complicated evaluation function and tuned the weights using
TDLeaf. We actually did 4 ply searches and played hundreds of
thousands of games over a several week time period. To save time, we
pre-tuned the weights with 2 ply games until they got farily stable
and then went from there with the 4 ply games. I also did some things
to optimize the search time of very shallow searches to speed things
up.
>From time to time we would play some matches with the hand tuned
version of the program and we watched it improve over the weeks. We
were fairly surprised by the results. When we stopped we had a
program that could beat our standard program almost 70% of the time.
When we looked at the weights the program chose, many of them seemed
odd, but the program was indeed better. The best surprise was that it
played much more interesting chess.
I think one advantage of this kind of thing is that the algorithm is
immune from the fears and predjudices that a human will impose upon it
when engineering the weights manually. In our case, the program was
not afraid to play what seemed like much riskier moves such as
sacrafices, moves it would never have tried before. But this new
style from the point of view of the algorithm wasn't risky, it was the
surest path to sucess as measured by TDLeaf.
One very desirable characteristic of the new evaluation weights was a
de-emphasis on material values. It seems that the values of the
pieces had more to do with their total positional value and less on
the static fixed values that we usually assign to pieces.
It is very hard to claim success despite what I just related because
it is not clear how good the initial hand tuned weight actually were.
I can only say I really liked the way it played and that this seemed
to be a better way to choose weights that what I was capable of doing
on my own.
Unfortunately, the program was in heavy development during the weeks
it was being tuned by TDLeaf. The evaluation changed significantly
and the new weights were out of date. We never actually got to
benefit from the technique since we did not have the time to start
over.
Don
Date: Tue, 15 Jul 2003 12:24:19 -0600
From: Markus Enzenberger <compgo@xxxxxxxxxxxxxxxxx>
Content-type: text/plain; charset=iso-8859-1
Content-disposition: inline
User-Agent: KMail/1.5.1
Sender: owner-computer-go@xxxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: computer-go@xxxxxxxxxxxxxxxxx
> > there is an algorithm called TDLeaf, but I am not
> > convinced that it is useful.
>
> A quick web search found a paper by Baxter, Tridgell, and
> Weaver. Is this the canonical one?
yes.
> Also, can you say why you're not convinced this is
> useful?
it was used for training evaluation functions in chess that
used the material value of of the position as input.
Then you have the disadvantage that the material value
can change at every move in an exchange of pieces
which would give you horrible training patterns.
TDLeaf avoids this by using search to get more appropriate
target positions for training (e.g. after the exchange has
happened).
But you pay a very high price for it, because move
generation during self-play is now exponentially slower.
IMHO it would have been better to do a quiescence search for
determining the material value of a position used as input
for the evaluation function and choose the moves during
self-play by 1-ply look-ahead.
However I haven't performed any experiments and the neural
network in NeuroGo is much to slow to use TDLeaf.
> > NeuroGo in its most recent version uses local
> > connectivity and single-point eyes as additional
> > outputs that are trained with TD. I will present a
> > paper about this at ACG2003 which takes place together
> > with the Computer Olympiad in Graz/Austria in November.
>
> So when and how do those of us stuck stateside get ahold
> of it? :-)
I'll put the paper online when the final version is ready.
- Markus