[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
computer-go: Temporal Difference Learning
There appear to have been several attempts at Temporal Difference
learning for Go. The general approach seems to be:
- Randomly initialize a neural network which, given a board state,
estimates its value.
- Either gather a set of recorded games or train through self play. If
self play is used, choose moves based on 1-ply lookahead, usually with
some random element to encourage exploration.
- As each move is played, adjust the network weights so that it is more
likely to evaluate the preceding board in the same way as the resulting
board. If the resulting board is an endgame position, use the actual
score rather than the network's evaluation for training.
Are there any major deviations from this plan within TD programs?
Specifically, is anyone:
...doing more than 1 ply lookahead?
...using TD to learn to answer more specific questions, such as, "can
these two chains connect"?
...using function approximators other than neural networks, e.g.,
decision trees?
Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/