[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[computer-go] Re: What is Thought?
>> There are experiments which consistently show that given a simple
>> algorithm such as neural nets or decision trees can perfectly learn
>> a concept on the training data, and at the same time aggregated
>> versions of the same algorithms will always do better on the test
>> set.
>> There are proofs that "explain" these results by claiming that a
>> representation exist for the neural network and or decision tree
>> which represents the concept learned by the aggregated version, but
>> the result is computationally infeasible to find.
>> In any case the actuall more complicated explanation fits the data
>> as well as the actuall simple version, but the more complicated one
>> generalizes better and therefore is preferred. If Occam's Razor
>> cannot explain this result without much explanation, then we need
>> to get a new theory.
Without citations I'm not 100% sure what you are referring to, but
here is a suggestion. How complex an explanation is is not obvious--
which is one of the reasons it is useful to study the problem
formally. You may say Newton's laws are simple and explain a vast
array of phenomena, but how do you quantify that? They appeal
to other concepts in the mind.
Likewise, if you set out to train a million weight neural net,
but use a training algorithm that will basically only explore a
tiny fraction of the space, you may effectively have little more
flexibility in the hypotheses you will produce than a linear fit,
so your explanation, while at first appearing complex, will
be very simple and generalize well. Chapter 6 of What is Thought?
discusses this phenomenon, referring to research that indicates
this is why large dumb neural nets often generalize remarkably
well. A 1000 weight neural net trained by backprop
may have vastly less effective expressive power
(and thus more generalization ability on small data sets)
than a 10 weight polynomial trained exactly using linear
programming on the coefficients.
Likewise, the vote of many slightly randomized predictors may have
much less complexity than even a single such predictor, by washing
over much of the expressive power of the predictors.
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://computer-go.org/mailman/listinfo/computer-go