[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)

To: computer-go <computer-go@xxxxxxxxxxxxxxx>
Subject: Re: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)
From: "Nicol N. Schraudolph" <compgo@xxxxxxxxxxxxxxx>
Date: Tue, 7 Sep 2004 14:09:28 +0200 (CEST)
Cc: Douglas Ridgway <ridgway@xxxxxxxxxxxx>
Delivered-to: computer-go@xxxxxxxxxxxxxxxxx
In-reply-to: <DB27DCE0-0087-11D9-9F42-000393753538@xxxxxxxxxxxxxxxxx>
List-archive: <http://computer-go.org/pipermail/computer-go>
List-help: <mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=help>
List-id: computer-go <computer-go.computer-go.org>
List-post: <mailto:computer-go@xxxxxxxxxxxxxxxxx>
List-subscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=subscribe>
List-unsubscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=unsubscribe>
Reply-to: computer-go <computer-go@xxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxx

> 	http://remi.coulom.free.fr/WhoIsBest.zip
>
> contains a table claiming that with 3 losses and 10 wins the confidence
> level is 97% that the winner of 10 is "better." This result is based 
> upon a formula from Bayes. I am not enough of a statistician to know.

While the 97% confidence level from the above paper is technically correct,
it is very easy to overinterpret such numbers.  A Bayesian analysis that
leads to more conservative results that (to me, unlike confidence levels)
intuitively feel "right" is in terms of likelihood ratios, as can be seen
in the one-page paper

    http://www.cs.toronto.edu/~mackay/euro.pdf

The author (a well-known Bayesian statistician) analyses coin tosses that
appear "suspicious" at a 93% confidence level in terms of likelihood
ratios.  Replace "coin" with "go", "toss" with "game", "heads" with
"win", and "tails" with "loss", and you can directly apply his analysis.
What I get for the numbers above (with a uniform prior) is a likelihood
ratio of just 2:1 in favor of the hypothesis "one program is better"
("the coin is biased") over "both programs are equally strong" ("the
coin is fair").

This should be considered *very weak* evidence in favor of one program
being better.  For a convincing result one would like to see a likelihood
ratio at least an order of magnitude higher.  So don't put too much
confidence in confidence levels!

Regards,

- nic

-- 
    Dr. Nicol N. Schraudolph                 http://n.schraudolph.org/
    Sonnenkopfweg 17
    D-87527 Sonthofen, Germany

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/

Follow-Ups:
- Re: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)
  - From: Don Dailey
- Re: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)
  - From: Don Dailey

References:
- [computer-go] Statistical Significance (was: SlugGo v.s. Many Faces, newest data)
  - From: David G Doshay

Prev by Date: Re: [computer-go] FF[5] (was: SGC)
Next by Date: Re: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)
Previous by thread: [computer-go] Statistical Significance (was: SlugGo v.s. Many Faces, newest data)
Next by thread: Re: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)
Index(es):
- Date
- Thread