[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
computer-go: RE: Learn from Self-Play
I wonder if there has been research going on in the type of neural
network that is most suitable for generalizing go board postions to
other positions. It seems like most people in the field has used a
standard two-layer back-propagation network as the representation. I
have tried the same approach on something as simple as tic-tac-toe and I
can't say that the system generalizes in a very promising way. I wonder
if radial base networks would be more appropriate or some other
architecture.
It might be the case that this is impossible to do for go, since the
board (pattern entering the network) looks very similar between two
consecutive moves, but the best move to make between two moves are not
similar at all.
/ Mans Ullerstam
-----Original Message-----
From: owner-computer-go@xxxxxxxxxxxxxxxxx
[mailto:owner-computer-go@xxxxxxxxxxxxxxxxx] On Behalf Of Ran Xiao
Sent: den 17 februari 2003 20:30
To: Computer-Go; Måns Ullerstam
Subject: computer-go: RE: Learn from Self-Play
-----Original Message-----
From: Måns Ullerstam [mailto:mans@xxxxxxxxxxxxxxxxx]
Sent: Monday, February 17, 2003 1:42 AM
To: 'Ran Xiao'
Subject: RE: computer-go: Learn from Self-Play
- Is that a neural network or are you storing all board positions and
moves directly?
All board positions are stored, and NN is used to learn the evaluation
function.
- You are saying that you let them play other computer go programs
first. What programs and for how many games?
ManyFaces, HandTalk, EZGO, Goliath, and TurboGO. All downloadable from
somewhere Internet free on 9x9 board.
- You are not specifically specifying the learning mechanism, but it
sounds like you have implemented some form of Reinforcement
Learning,TD(lambda) or something similar. Can you elaborate on that.
Since I save all board positions with their scores, direct TD or RL are
not really needed.
- The evaluation function is evaluating what, the score of the board,
the probability of winning, something else?
Based on the score of the board.
Weimin Xiao