[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Programs learning to play Go



On Tue, 21 Aug 2001, Heikki Levanto wrote:

...
> And one output: Probability of winning from this position.
>
>
> I have a few ideas for training: Either study complete games, and assume
> that the probability starts from 50% and ends in 0% or 100%. Assume it goes
> linearily (for lack of better info). Thus you get a number of positions and
> percentages.
>
> Probably a better way is to note that every reasonable move is played to
> improve the winning probability. Thus we can evaluate the position before
> and after a move, and correct the net if it believes the probability to
...

Heikki,

Temporal Difference (TD) learning is a principled way of implementing what
you're after here, without assuming any particular shape (such as being
linear, or monotonic) for the predicted winning probability.

Best,

- Nici.

-- 
    Dr. Nicol N. Schraudolph              http://www.icos.ethz.ch/~schraudo/
    Institute of Computational Sciences            mobile:  +41-76-585-3877
    ETH Zentrum, WET-D, Weinbergstr. 43            office:     -1-632-7942
    CH-8092 Zuerich, Switzerland                      fax:           -1703