[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)



> 	http://remi.coulom.free.fr/WhoIsBest.zip
>
> contains a table claiming that with 3 losses and 10 wins the confidence
> level is 97% that the winner of 10 is "better." This result is based 
> upon a formula from Bayes. I am not enough of a statistician to know.

While the 97% confidence level from the above paper is technically correct,
it is very easy to overinterpret such numbers.  A Bayesian analysis that
leads to more conservative results that (to me, unlike confidence levels)
intuitively feel "right" is in terms of likelihood ratios, as can be seen
in the one-page paper

    http://www.cs.toronto.edu/~mackay/euro.pdf

The author (a well-known Bayesian statistician) analyses coin tosses that
appear "suspicious" at a 93% confidence level in terms of likelihood
ratios.  Replace "coin" with "go", "toss" with "game", "heads" with
"win", and "tails" with "loss", and you can directly apply his analysis.
What I get for the numbers above (with a uniform prior) is a likelihood
ratio of just 2:1 in favor of the hypothesis "one program is better"
("the coin is biased") over "both programs are equally strong" ("the
coin is fair").

This should be considered *very weak* evidence in favor of one program
being better.  For a convincing result one would like to see a likelihood
ratio at least an order of magnitude higher.  So don't put too much
confidence in confidence levels!

Regards,

- nic

-- 
    Dr. Nicol N. Schraudolph                 http://n.schraudolph.org/
    Sonnenkopfweg 17
    D-87527 Sonthofen, Germany

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/