[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
computer-go: RE: Learn from Self-Play
I am using generalized neural network in ForeverBlack, so that potential
concept layers can be generated automatically.
Weimin Xiao
-----Original Message-----
From: owner-computer-go@xxxxxxxxxxxxxxxxx
[mailto:owner-computer-go@xxxxxxxxxxxxxxxxx]On Behalf Of Måns Ullerstam
Sent: Monday, February 24, 2003 4:56 AM
To: computer-go@xxxxxxxxxxxxxxxxx
Subject: computer-go: RE: Learn from Self-Play
I wonder if there has been research going on in the type of neural network
that is most suitable for generalizing go board postions to other positions.
It seems like most people in the field has used a standard two-layer
back-propagation network as the representation. I have tried the same
approach on something as simple as tic-tac-toe and I can't say that the
system generalizes in a very promising way. I wonder
if radial base networks would be more appropriate or some other
architecture.
It might be the case that this is impossible to do for go, since the board
(pattern entering the network) looks very similar between two consecutive
moves, but the best move to make between two moves are not similar at all.
/ Mans Ullerstam
-----Original Message-----
From: owner-computer-go@xxxxxxxxxxxxxxxxx
[mailto:owner-computer-go@xxxxxxxxxxxxxxxxx] On Behalf Of Ran Xiao
Sent: den 17 februari 2003 20:30
To: Computer-Go; Måns Ullerstam
Subject: computer-go: RE: Learn from Self-Play
-----Original Message-----
From: Måns Ullerstam [mailto:mans@xxxxxxxxxxxxxxxxx]
Sent: Monday, February 17, 2003 1:42 AM
To: 'Ran Xiao'
Subject: RE: computer-go: Learn from Self-Play
- Is that a neural network or are you storing all board positions and moves
directly?
All board positions are stored, and NN is used to learn the evaluation
function.
- You are saying that you let them play other computer go programs first.
What programs and for how many games?
ManyFaces, HandTalk, EZGO, Goliath, and TurboGO. All downloadable from
somewhere Internet free on 9x9 board.
- You are not specifically specifying the learning mechanism, but it sounds
like you have implemented some form of Reinforcement Learning,TD(lambda) or
something similar. Can you elaborate on that.
Since I save all board positions with their scores, direct TD or RL are not
really needed.
- The evaluation function is evaluating what, the score of the board, the
probability of winning, something else?
Based on the score of the board.
Weimin Xiao