[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] TD(lambda), Neural Networks and evaluation functions

To: computer-go <computer-go@xxxxxxxxxxxxxxx>
Subject: Re: [computer-go] TD(lambda), Neural Networks and evaluation functions
From: Peter Drake <drake@xxxxxxxxxx>
Date: Tue, 16 Sep 2003 14:57:16 -0700
Delivered-to: computer-go@xxxxxxxxxxxxxxxxx
In-reply-to: <Pine.LNX.4.21.0309162225150.26691-100000@xxxxxxxxxxxxxxxxx>
List-archive: <http://computer-go.org/pipermail/computer-go>
List-help: <mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=help>
List-id: computer-go <computer-go.computer-go.org>
List-post: <mailto:computer-go@xxxxxxxxxxxxxxxxx>
List-subscribe: <http://computer-go.org/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=subscribe>
List-unsubscribe: <http://computer-go.org/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=unsubscribe>
Reply-to: computer-go <computer-go@xxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxx

Yup, we did this over the past summer, with not too much success. I'm now convinced that it is necessary to give the program more structure, e.g., some way of modeling chains/blocks, connections, eyes, and so on. This is too much for a generic feed-forward neural net to derive by itself in any reasonable amount of time.

On Tuesday, September 16, 2003, at 02:47 PM, Imran Ghory wrote:

I've had an interesting idea about implementing a Go program using Neural
Networks (I'm assuming 9x9 board although there's no reason it could be
extend to 19x19), it basically runs alongs the following lines,

1) Create 81 neural networks (one associated with each intersection on the
board). Let's represent them by N(x, input-board) with x=1...81.
2) Use temporal difference learning to teach the neural networks, with the
rewards being +1/-1 depending on which side controls that intersection at
the end of that game.

You'ld play by having an evaluation function which for a given board
returns the sum(N(x, input-board); x=1..81), i.e. it would predict the
final score.

This approach would have the advantages over other TD approaches as the
functions that the NN has to approximate would be very smooth. Also there
would be many linear features (for instance stones surrounding a
point) that would allow the networks to "boot-strap" themselves a la
td-gammon.

Has anyone experimented with this kind of approach before ?

Imran Ghory
--
http://bits.bris.ac.uk/imran

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://computer-go.org/mailman/listinfo/computer-go

Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://computer-go.org/mailman/listinfo/computer-go

Follow-Ups:
- Re: [computer-go] TD(lambda), Neural Networks and evaluation functions
  - From: Imran Ghory

References:
- [computer-go] TD(lambda), Neural Networks and evaluation functions
  - From: Imran Ghory

Prev by Date: [computer-go] TD(lambda), Neural Networks and evaluation functions
Next by Date: Re: [computer-go] TD(lambda), Neural Networks and evaluation functions
Previous by thread: [computer-go] TD(lambda), Neural Networks and evaluation functions
Next by thread: Re: [computer-go] TD(lambda), Neural Networks and evaluation functions
Index(es):
- Date
- Thread