[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: computer-go: Learning from existing games
On Thu, 16 Jan 2003, Erik van der Werf wrote:
> Do you train on-line or in batch style (e.g., several games before a
> weight update)?
>
> I think algorithms for faster weight update only work well with stable
> gradient information. On-line learning may not provide good enough
> gradient information.
One of the reasons why reinforcement learning is slow is that it is
*inherently* online in that what the network learns influences its
future actions, which in turn influence the data (board positions) it
is going to see and learn from. Due to this feedback loop, batching
tends to make this kind of setup *less* stable - the opposite of what
happens when you're learning some fixed set of data.
Be warned that almost none of the efficient neural network learning
algorithms work properly online (this includes: Levenberg-Marquardt,
conjugate gradient, delta-bar-delta, RPROP, SuperSAB, OSS, ...), so
you're either stuck with simple gradient descent (backprop) which is
excruciatingly slow, or - if you're adventurous - you can try using
(ta-dah!) my SMD algorithm, described towards the end of this paper:
http://n.schraudolph.org/pubs/mvp.pdf
Advertisement: SMD is the only gradient method I know of that converges
efficiently, scales well to very large nonlinear systems, and works
online. It's not trivial to implement but may well be worth the effort
if you're stuck with slow training times.
Regards,
--
Dr. Nicol N. Schraudolph http://n.schraudolph.org/
Inst. of Computational Science mobile: +41-76-585-3877
ETH Zentrum, HRS H30 tel: -1-251-3661
CH-8092 Zuerich, Switzerland fax: -632-1374