[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Learning from existing games



On Thu, 16 Jan 2003, Erik van der Werf wrote:

> Do you train on-line or in batch style (e.g., several games before a
> weight update)?
> 
> I think algorithms for faster weight update only work well with stable
> gradient information. On-line learning may not provide good enough
> gradient information.

One of the reasons why reinforcement learning is slow is that it is
*inherently* online in that what the network learns influences its
future actions, which in turn influence the data (board positions) it
is going to see and learn from.  Due to this feedback loop, batching
tends to make this kind of setup *less* stable - the opposite of what
happens when you're learning some fixed set of data.

Be warned that almost none of the efficient neural network learning
algorithms work properly online (this includes: Levenberg-Marquardt,
conjugate gradient, delta-bar-delta, RPROP, SuperSAB, OSS, ...), so
you're either stuck with simple gradient descent (backprop) which is
excruciatingly slow, or - if you're adventurous - you can try using
(ta-dah!) my SMD algorithm, described towards the end of this paper:

    http://n.schraudolph.org/pubs/mvp.pdf

Advertisement: SMD is the only gradient method I know of that converges
efficiently, scales well to very large nonlinear systems, and works
online.  It's not trivial to implement but may well be worth the effort
if you're stuck with slow training times.

Regards,

-- 
    Dr. Nicol N. Schraudolph                     http://n.schraudolph.org/
    Inst. of Computational Science               mobile:  +41-76-585-3877
    ETH Zentrum, HRS H30                            tel:      -1-251-3661
    CH-8092 Zuerich, Switzerland                    fax:        -632-1374