[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

computer-go: Temporal Difference Learning

To: computer-go@xxxxxxxxxxxxxxxxx
Subject: computer-go: Temporal Difference Learning
From: Peter Drake <drake@xxxxxxxxxx>
Date: Fri, 11 Jul 2003 15:44:59 -0700
Reply-to: computer-go@xxxxxxxxxxxxxxxxx
Sender: owner-computer-go@xxxxxxxxxxxxxxxxx

There appear to have been several attempts at Temporal Difference learning for Go. The general approach seems to be:

- Randomly initialize a neural network which, given a board state, estimates its value.
- Either gather a set of recorded games or train through self play. If self play is used, choose moves based on 1-ply lookahead, usually with some random element to encourage exploration.
- As each move is played, adjust the network weights so that it is more likely to evaluate the preceding board in the same way as the resulting board. If the resulting board is an endgame position, use the actual score rather than the network's evaluation for training.

Are there any major deviations from this plan within TD programs? Specifically, is anyone:

...doing more than 1 ply lookahead?
...using TD to learn to answer more specific questions, such as, "can these two chains connect"?
...using function approximators other than neural networks, e.g., decision trees?

Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/

Follow-Ups:
- Re: computer-go: Temporal Difference Learning
  - From: Markus Enzenberger

Prev by Date: computer-go: proxy operators available - Gifu Challenge 2003
Next by Date: Re: computer-go: Temporal Difference Learning
Previous by thread: computer-go: unconditional life with suicide allowed
Next by thread: Re: computer-go: Temporal Difference Learning
Index(es):
- Date
- Thread