[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: computer-go: RE: Learn from Self-Play
Generalization usually depends on the compactness of your
representation, i.e., similar positions should be close. Are you sure
that the "Compactness hypothesis" holds for your representation of
tic-tac-toe?
Erik
Måns Ullerstam wrote:
>
> I wonder if there has been research going on in the type of neural
> network that is most suitable for generalizing go board postions to
> other positions. It seems like most people in the field has used a
> standard two-layer back-propagation network as the representation. I
> have tried the same approach on something as simple as tic-tac-toe and I
> can't say that the system generalizes in a very promising way. I wonder
> if radial base networks would be more appropriate or some other
> architecture.
>
> It might be the case that this is impossible to do for go, since the
> board (pattern entering the network) looks very similar between two
> consecutive moves, but the best move to make between two moves are not
> similar at all.
>
> / Mans Ullerstam
>
> -----Original Message-----
> From: owner-computer-go@xxxxxxxxxxxxxxxxx
> [mailto:owner-computer-go@xxxxxxxxxxxxxxxxx] On Behalf Of Ran Xiao
> Sent: den 17 februari 2003 20:30
> To: Computer-Go; Måns Ullerstam
> Subject: computer-go: RE: Learn from Self-Play
>
> -----Original Message-----
> From: Måns Ullerstam [mailto:mans@xxxxxxxxxxxxxxxxx]
> Sent: Monday, February 17, 2003 1:42 AM
> To: 'Ran Xiao'
> Subject: RE: computer-go: Learn from Self-Play
>
> - Is that a neural network or are you storing all board positions and
> moves directly?
>
> All board positions are stored, and NN is used to learn the evaluation
> function.
>
> - You are saying that you let them play other computer go programs
> first. What programs and for how many games?
>
> ManyFaces, HandTalk, EZGO, Goliath, and TurboGO. All downloadable from
> somewhere Internet free on 9x9 board.
>
> - You are not specifically specifying the learning mechanism, but it
> sounds like you have implemented some form of Reinforcement
> Learning,TD(lambda) or something similar. Can you elaborate on that.
>
> Since I save all board positions with their scores, direct TD or RL are
> not really needed.
>
> - The evaluation function is evaluating what, the score of the board,
> the probability of winning, something else?
>
> Based on the score of the board.
>
> Weimin Xiao