[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)
> http://remi.coulom.free.fr/WhoIsBest.zip
>
> contains a table claiming that with 3 losses and 10 wins the confidence
> level is 97% that the winner of 10 is "better." This result is based
> upon a formula from Bayes. I am not enough of a statistician to know.
While the 97% confidence level from the above paper is technically correct,
it is very easy to overinterpret such numbers. A Bayesian analysis that
leads to more conservative results that (to me, unlike confidence levels)
intuitively feel "right" is in terms of likelihood ratios, as can be seen
in the one-page paper
http://www.cs.toronto.edu/~mackay/euro.pdf
The author (a well-known Bayesian statistician) analyses coin tosses that
appear "suspicious" at a 93% confidence level in terms of likelihood
ratios. Replace "coin" with "go", "toss" with "game", "heads" with
"win", and "tails" with "loss", and you can directly apply his analysis.
What I get for the numbers above (with a uniform prior) is a likelihood
ratio of just 2:1 in favor of the hypothesis "one program is better"
("the coin is biased") over "both programs are equally strong" ("the
coin is fair").
This should be considered *very weak* evidence in favor of one program
being better. For a convincing result one would like to see a likelihood
ratio at least an order of magnitude higher. So don't put too much
confidence in confidence levels!
Regards,
- nic
--
Dr. Nicol N. Schraudolph http://n.schraudolph.org/
Sonnenkopfweg 17
D-87527 Sonthofen, Germany
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/