[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

computer-go: RE: Learn from Self-Play

To: <computer-go@xxxxxxxxxxxxxxxxx>
Subject: computer-go: RE: Learn from Self-Play
From: Måns Ullerstam <mans@xxxxxxxxxxxx>
Date: Mon, 24 Feb 2003 11:56:19 +0100
Importance: Normal
In-reply-to: <HEECKAFDMBJOFMDPPGHMAEKJCAAA.ranxiao@xxxxxxxxxxxxxxxxx>
Reply-to: computer-go@xxxxxxxxxxxxxxxxx
Sender: owner-computer-go@xxxxxxxxxxxxxxxxx

I wonder if there has been research going on in the type of neural
network that is most suitable for generalizing go board postions to
other positions. It seems like most people in the field has used a
standard two-layer back-propagation network as the representation. I
have tried the same approach on something as simple as tic-tac-toe and I
can't say that the system generalizes in a very promising way. I wonder
if radial base networks would be more appropriate or some other
architecture.

It might be the case that this is impossible to do for go, since the
board (pattern entering the network) looks very similar between two
consecutive moves, but the best move to make between two moves are not
similar at all.

/ Mans Ullerstam

-----Original Message-----
From: owner-computer-go@xxxxxxxxxxxxxxxxx
[mailto:owner-computer-go@xxxxxxxxxxxxxxxxx] On Behalf Of Ran Xiao
Sent: den 17 februari 2003 20:30
To: Computer-Go; Måns Ullerstam
Subject: computer-go: RE: Learn from Self-Play


-----Original Message-----
From: Måns Ullerstam [mailto:mans@xxxxxxxxxxxxxxxxx]
Sent: Monday, February 17, 2003 1:42 AM
To: 'Ran Xiao'
Subject: RE: computer-go: Learn from Self-Play

- Is that a neural network or are you storing all board positions and
moves directly?

All board positions are stored, and NN is used to learn the evaluation
function.

- You are saying that you let them play other computer go programs
first. What programs and for how many games?

ManyFaces, HandTalk, EZGO, Goliath, and TurboGO. All downloadable from
somewhere Internet free on 9x9 board.

- You are not specifically specifying the learning mechanism, but it
sounds like you have implemented some form of Reinforcement
Learning,TD(lambda) or something similar. Can you elaborate on that.

Since I save all board positions with their scores, direct TD or RL are
not really needed.

- The evaluation function is evaluating what, the score of the board,
the probability of winning, something else?

Based on the score of the board.


Weimin Xiao

Follow-Ups:
- Re: computer-go: RE: Learn from Self-Play
  - From: Erik van der Werf
- computer-go: RE: Learn from Self-Play
  - From: Ran Xiao

References:
- computer-go: RE: Learn from Self-Play
  - From: Ran Xiao

Prev by Date: Re: computer-go: Data structures
Next by Date: computer-go: RE: Learn from Self-Play
Previous by thread: computer-go: RE: Learn from Self-Play
Next by thread: computer-go: RE: Learn from Self-Play
Index(es):
- Date
- Thread