[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)
> contains a table claiming that with 3 losses and 10 wins the confidence
> level is 97% that the winner of 10 is "better." This result is based
> upon a formula from Bayes. I am not enough of a statistician to know.
I have an interesting slant on this based on a practical observation.
If you try a new idea or make an experimental change to an existing
program and it tests extemely well after 10 or 20 games, you should
trust the results even less than the purely statistical considerations
would tell you. There is a great deal of "empirical" evidence that
any given NEW idea will, if anything, test slightly negative. Even
though it's difficult to scientifically factor this observation into
the statistics, it's a very real phenomenon. So if I do something to
my program and it tests 9-1, for instance, I just laugh.
This would not be a consideration given two completely random programs
with no previous knowledge of either.
If I started from scratch and wrote a new program that used entirely
different ideas and techniques, I would get more excited about a 9-1
results, still keeping in mind that even 9-1 out of 10 samples isn't
yet anything worth losing sleep over.
Of course I'm not being critical of the results reported so far,
especially since they seem to be reported objectively with no
extravagant claims being made yet. I think the results as they stand
give a lot of reason for optimism. One thing you can say for sure is
that it is much more likely this is an improvement than it is not.
There is something I often do when making a decision to keep a version
I suspected is stronger. I used a quick and dirty rule which is to
keep testing until one version was ahead by N games, say 20 or 30.
Let's say 20 for example. If a version is significantly stronger it
will get this lead quickly and you can have some confidence that it is
probably not weaker! If the version is only slightly better, it's
unlikely to get behind by 20 games and will eventually get ahead. If
it takes a lot of games to get ahead 20 games, you can be sure that at
worst it can't be signficantly weaker and more than likely is at least
slightly stronger. You can do this more scientfically with
statistics, but this works pretty well in practice.
- Don
Date: Tue, 7 Sep 2004 14:09:28 +0200 (CEST)
From: "Nicol N. Schraudolph" <compgo@xxxxxxxxxxxxxxxxx>
> http://remi.coulom.free.fr/WhoIsBest.zip
>
> contains a table claiming that with 3 losses and 10 wins the confidence
> level is 97% that the winner of 10 is "better." This result is based
> upon a formula from Bayes. I am not enough of a statistician to know.
While the 97% confidence level from the above paper is technically correct,
it is very easy to overinterpret such numbers. A Bayesian analysis that
leads to more conservative results that (to me, unlike confidence levels)
intuitively feel "right" is in terms of likelihood ratios, as can be seen
in the one-page paper
http://www.cs.toronto.edu/~mackay/euro.pdf
The author (a well-known Bayesian statistician) analyses coin tosses that
appear "suspicious" at a 93% confidence level in terms of likelihood
ratios. Replace "coin" with "go", "toss" with "game", "heads" with
"win", and "tails" with "loss", and you can directly apply his analysis.
What I get for the numbers above (with a uniform prior) is a likelihood
ratio of just 2:1 in favor of the hypothesis "one program is better"
("the coin is biased") over "both programs are equally strong" ("the
coin is fair").
This should be considered *very weak* evidence in favor of one program
being better. For a convincing result one would like to see a likelihood
ratio at least an order of magnitude higher. So don't put too much
confidence in confidence levels!
Regards,
- nic
--
Dr. Nicol N. Schraudolph http://n.schraudolph.org/
Sonnenkopfweg 17
D-87527 Sonthofen, Germany
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/