[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: RE: Learn from Self-Play

To: computer-go@xxxxxxxxxxxxxxxxx
Subject: Re: computer-go: RE: Learn from Self-Play
From: Erik van der Werf <E.vanderWerf@xxxxxxxxxxxxx>
Date: Mon, 24 Feb 2003 14:06:48 +0100
References: <001801c2dbf3$5c91f380$6600a8c0@kanin>
Reply-to: computer-go@xxxxxxxxxxxxxxxxx
Sender: owner-computer-go@xxxxxxxxxxxxxxxxx

Generalization usually depends on the compactness of your
representation, i.e., similar positions should be close. Are you sure
that the "Compactness hypothesis" holds for your representation of
tic-tac-toe?

Erik



Måns Ullerstam wrote:
> 
> I wonder if there has been research going on in the type of neural
> network that is most suitable for generalizing go board postions to
> other positions. It seems like most people in the field has used a
> standard two-layer back-propagation network as the representation. I
> have tried the same approach on something as simple as tic-tac-toe and I
> can't say that the system generalizes in a very promising way. I wonder
> if radial base networks would be more appropriate or some other
> architecture.
> 
> It might be the case that this is impossible to do for go, since the
> board (pattern entering the network) looks very similar between two
> consecutive moves, but the best move to make between two moves are not
> similar at all.
> 
> / Mans Ullerstam
> 
> -----Original Message-----
> From: owner-computer-go@xxxxxxxxxxxxxxxxx
> [mailto:owner-computer-go@xxxxxxxxxxxxxxxxx] On Behalf Of Ran Xiao
> Sent: den 17 februari 2003 20:30
> To: Computer-Go; Måns Ullerstam
> Subject: computer-go: RE: Learn from Self-Play
> 
> -----Original Message-----
> From: Måns Ullerstam [mailto:mans@xxxxxxxxxxxxxxxxx]
> Sent: Monday, February 17, 2003 1:42 AM
> To: 'Ran Xiao'
> Subject: RE: computer-go: Learn from Self-Play
> 
> - Is that a neural network or are you storing all board positions and
> moves directly?
> 
> All board positions are stored, and NN is used to learn the evaluation
> function.
> 
> - You are saying that you let them play other computer go programs
> first. What programs and for how many games?
> 
> ManyFaces, HandTalk, EZGO, Goliath, and TurboGO. All downloadable from
> somewhere Internet free on 9x9 board.
> 
> - You are not specifically specifying the learning mechanism, but it
> sounds like you have implemented some form of Reinforcement
> Learning,TD(lambda) or something similar. Can you elaborate on that.
> 
> Since I save all board positions with their scores, direct TD or RL are
> not really needed.
> 
> - The evaluation function is evaluating what, the score of the board,
> the probability of winning, something else?
> 
> Based on the score of the board.
> 
> Weimin Xiao

References:
- computer-go: RE: Learn from Self-Play
  - From: Måns Ullerstam

Prev by Date: computer-go: RE: Learn from Self-Play
Next by Date: computer-go: gtp mode
Previous by thread: computer-go: RE: Learn from Self-Play
Next by thread: computer-go: Data structures
Index(es):
- Date
- Thread