[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: computer-go: Programs learning to play Go
On Tue, 21 Aug 2001, Heikki Levanto wrote:
...
> And one output: Probability of winning from this position.
>
>
> I have a few ideas for training: Either study complete games, and assume
> that the probability starts from 50% and ends in 0% or 100%. Assume it goes
> linearily (for lack of better info). Thus you get a number of positions and
> percentages.
>
> Probably a better way is to note that every reasonable move is played to
> improve the winning probability. Thus we can evaluate the position before
> and after a move, and correct the net if it believes the probability to
...
Heikki,
Temporal Difference (TD) learning is a principled way of implementing what
you're after here, without assuming any particular shape (such as being
linear, or monotonic) for the predicted winning probability.
Best,
- Nici.
--
Dr. Nicol N. Schraudolph http://www.icos.ethz.ch/~schraudo/
Institute of Computational Sciences mobile: +41-76-585-3877
ETH Zentrum, WET-D, Weinbergstr. 43 office: -1-632-7942
CH-8092 Zuerich, Switzerland fax: -1703