[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Programs learning to play Go

To: <computer-go@xxxxxxxxxxxxxxxxx>
Subject: Re: computer-go: Programs learning to play Go
From: Nicol Schraudolph <schraudo@xxxxxxxxxxx>
Date: Thu, 23 Aug 2001 14:06:47 +0200 (MEST)
In-reply-to: <20010821000847.A10737@xxxxxxxxxxxxxxxxx>
Reply-to: computer-go@xxxxxxxxxxxxxxxxx
Sender: owner-computer-go@xxxxxxxxxxxxxxxxx

On Tue, 21 Aug 2001, Heikki Levanto wrote:

...
> And one output: Probability of winning from this position.
>
>
> I have a few ideas for training: Either study complete games, and assume
> that the probability starts from 50% and ends in 0% or 100%. Assume it goes
> linearily (for lack of better info). Thus you get a number of positions and
> percentages.
>
> Probably a better way is to note that every reasonable move is played to
> improve the winning probability. Thus we can evaluate the position before
> and after a move, and correct the net if it believes the probability to
...

Heikki,

Temporal Difference (TD) learning is a principled way of implementing what
you're after here, without assuming any particular shape (such as being
linear, or monotonic) for the predicted winning probability.

Best,

- Nici.

-- 
    Dr. Nicol N. Schraudolph              http://www.icos.ethz.ch/~schraudo/
    Institute of Computational Sciences            mobile:  +41-76-585-3877
    ETH Zentrum, WET-D, Weinbergstr. 43            office:     -1-632-7942
    CH-8092 Zuerich, Switzerland                      fax:           -1703

References:
- Re: computer-go: Programs learning to play Go
  - From: Heikki Levanto

Prev by Date: Re: computer-go: Programs learning to play Go
Next by Date: computer-go: GNU Go 3.0
Previous by thread: Re: computer-go: Programs learning to play Go
Next by thread: Re: computer-go: Programs learning to play Go
Index(es):
- Date
- Thread