[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[computer-go] Don's Law (was: SlugGo vs Many Faces)

To: computer-go <computer-go@xxxxxxxxxxxxxxx>
Subject: [computer-go] Don's Law (was: SlugGo vs Many Faces)
From: "Nicol N. Schraudolph" <compgo@xxxxxxxxxxxxxxx>
Date: Thu, 9 Sep 2004 15:16:55 +0200 (CEST)
Delivered-to: computer-go@xxxxxxxxxxxxxxxxx
In-reply-to: <200409090014.i890EMCP017017@xxxxxxxxxxxxxxxxx>
List-archive: <http://computer-go.org/pipermail/computer-go>
List-help: <mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=help>
List-id: computer-go <computer-go.computer-go.org>
List-post: <mailto:computer-go@xxxxxxxxxxxxxxxxx>
List-subscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=subscribe>
List-unsubscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=unsubscribe>
Reply-to: computer-go <computer-go@xxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxx

Let's call it Don's Law: if it looks too good to be true, it probably isn't.

> Don, I think you're observations about new ideas giving odd results at
> first, must be due to some psychological factor.  Perhaps you tend to
> remember the suprising results, or perhaps you test a lot of ideas that
> don't lead to improved performance, but quickly dismiss those that don't
> get off to a good start in the tests.

I suspect that what Don is observing is an effect of overfitting: if you
get a suspiciously strong improvement within the testbed in which you're
developing your program, chances are that it's really exploiting some
weakness particular to your testbed, and the "improvement" will disappear
- in fact, turn out to be a liability - upon independent validation.
This is a well-known phenomenon in machine learning, and can be quite
devious when there's a human developer in the loop.  It's amazing how
subtle flaws in a testbed can be picked up and exploited unwittingly
by a human developer.

If you want to be statistically rigorous, once you start making
development decisions based on validation results, the validation test
must be considered contaminated and becomes part of your development
testbed.  Thus over the course of development you may have to repeatedly
come up with fresh independent validation tests, which becomes harder
and harder as you exhaust all possibilities.  In the end you're left
with "if it beats all other go programs it can't be bad".

Regards,

- nic

-- 
    Dr. Nicol N. Schraudolph                 http://n.schraudolph.org/
    Sonnenkopfweg 17
    D-87527 Sonthofen, Germany



_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/

References:
- Re: [computer-go] SlugGo vs Many Faces, newest data
  - From: Don Dailey

Prev by Date: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
Next by Date: Re: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
Previous by thread: Re: [computer-go] SlugGo vs Many Faces, newest data
Next by thread: Re: [computer-go] SlugGo vs Many Faces, newest data
Index(es):
- Date
- Thread