[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)

To: computer-go@xxxxxxxxxxxxxxx
Subject: RE: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)
From: "Eric Boesch" <ericboesch@xxxxxxxxxxx>
Date: Tue, 07 Sep 2004 13:42:33 -0400
Delivered-to: computer-go@xxxxxxxxxxxxxxxxx
List-archive: <http://computer-go.org/pipermail/computer-go>
List-help: <mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=help>
List-id: computer-go <computer-go.computer-go.org>
List-post: <mailto:computer-go@xxxxxxxxxxxxxxxxx>
List-subscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=subscribe>
List-unsubscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=unsubscribe>
Reply-to: computer-go <computer-go@xxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxx

A nitpick: Remy's numbers in the WhoIsBest.pdf table seem to be rounded off in what I would consider the wrong direction. To obtain greater than 0.97 confidence with 0, 1, 3, or 40 losses, one needs 6, 8, 11, or 60 wins, and not 5, 7, 10, or 59 as listed. Here, I define confidence (as I imagine Remy does) as 1 minus (the probability that the given number of games minus draws would yield no more than the given number of losses if the two players had equal chances of winning).

For anyone interested, I have appended the program I used, which does a calculation, not an estimate, of binomial distribution ranges -- not that anyone here is likely to care about the seventh decimal place anyway! And Remy's version is surely faster for large numbers of games. (I apologize for including code instead of a URL, but I'm likely to switch ISPs soon.)

Off the topic, I consider myself a Perl advocate, but in my opinion, Perl is a bad fit for CPU-intensive manipulation of data structures, and therefore Go programming, unless all you need is a very slow prototype. When I ported my pattern-matching algorithm from Perl to C++, it ran 100x faster (though the port also took a few times longer to write than the original program did) -- that difference is exceptional, but it suggests to me that it's not the kind of work that Perl is well suited to.

From: David G Doshay <ddoshay@xxxxxxxxxxxxxxxxx>

Reply-To: computer-go <computer-go@xxxxxxxxxxxxxxxxx>
To: Douglas Ridgway <ridgway@xxxxxxxxxxxxxxxxx>,computer-go <computer-go@xxxxxxxxxxxxxxxxx>
Subject: [computer-go] Statistical Significance (was: SlugGo v.s. Many Faces, newest data)
Date: Mon, 6 Sep 2004 21:39:08 -0700

On Sep 6, 2004, at 8:38 PM, Douglas Ridgway wrote:
On Fri, 3 Sep 2004, David G Doshay wrote:
I found it interesting that Many Faces won 10 of the first 13, and then
SlugGo won 11 of the next 14. This was what I meant in an earlier
email when I wrote about how easy it is to be fooled by small samples.
I wrote something for gnugo-devel on March 2, 2004 on essentially this.
Fortunately, neither 10/13 nor 11/14 is quite long enough to meet a 5%
test, so even if you got only one of these streaks then stopped, standard
statistics avoids being misled. My brain wants to draw an inference long
earlier, however.
The paper "Statistical Significance of a Match" by Rémi Coulom, posted
here on 4 Sept and available at
	http://remi.coulom.free.fr/WhoIsBest.zip
contains a table claiming that with 3 losses and 10 wins the confidence
level is 97% that the winner of 10 is "better." This result is based upon
a formula from Bayes. I am not enough of a statistician to know.

So many statistics, so little ... (must be something good to put here)
It looks to me like we would need at least 200 games to know what
is going on at 4 stones. All of these games involved huge groups and
large strings of cut off stones.
Even 200 might not be enough. I estimated a sample size of 260 or so to
show a difference for a player who wins 60% (see the Mar 2 email), which
is at the outside of the likely differential at 4 stones. For 3 or 5
stones, it'd be okay.
His table also claims 97% Cl at 40 losses and 59 wins, essentially the 60%
you mention.

I must say that the swings I see lead me to think that the Bayesian formula
is too optimistic for this Go playing application, and that Doug's Cl's at least
feel about right.

Rémi, do you have any further thoughts?

Thanks,
David

/* binomial_range.c: compute binomial ranges. Eric Boesch */

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

int main(int argc, char **argv);
double binomial_range(int flips, int heads_min, int heads_max, double prob_heads);
double binomial_range_sub(int flips, int heads_max, double prob_heads);

int main (int argc, char **argv)
{
double flips, min, max, prob_heads;
double p;
if (argc >= 3) {
flips = atoi(argv[1]);
min = atoi(argv[2]);
if (argc >= 4) {
max = atoi(argv[3]);
}
else {
max = min;
}
if (argc >= 5) {
prob_heads = atof(argv[4]);
} else {
prob_heads = 0.5;
}
p = binomial_range(flips, min, max, prob_heads);
if (p > 0.01) {
printf("%11.8f\n", p);
} else {
printf("%12e\n", p);
}
return 0;
} else {
fprintf
(stderr,
"%s <flips> <min> [<max> [<prob>]]\n\n"
"Return the probability that <flips> coin flips will produce\n"
"between <min> and <max> heads, inclusive, if the probability\n"
"of the coin turning up heads is <prob>. If <max> is omitted\n"
"then <max>=<min>; if <prob> is omitted then <prob> = 0.5.\n",
argv[0]);
exit(1);
}
}

/* Return the probability of getting between heads_min and heads_max
heads inclusive in the given number of flips.

To avoid over- and underflow, I compute ratios starting near the
median number of heads instead of directly compute factorials, etc.
For example, if p is the probability of getting exactly 100 heads
in 200 flips, then the probability of getting exactly 99 heads in
200 flips equals p * (100/101).

*/

double binomial_range(int flips, int heads_min, int heads_max, double prob_heads)
{
double p_median;
double r_lt_min;
double r_leq_max;

int median = prob_heads * flips + 0.5;
if (heads_min > median) {
int tmp;
/* Reverse the values to get better precision. */
prob_heads = 1.0 - prob_heads;
tmp = heads_min;
heads_min = flips - heads_max;
heads_max = flips - tmp;
}

p_median = 1.0 / binomial_range_sub(flips, flips, prob_heads);
r_lt_min = binomial_range_sub(flips, heads_min - 1, prob_heads);
r_leq_max = binomial_range_sub(flips, heads_max, prob_heads);

return p_median * (r_leq_max - r_lt_min);
}

/* Return the ratio of the probability of getting heads_max flips or
fewer in the given number of flips, to the probability of getting
exactly median heads in the given number of flips. */

double binomial_range_sub(int flips, int heads_max, double prob_heads)
{
double rh;
int i;

int median = flips * prob_heads + 0.5;
double rh_tot = 0.0;
double hr = prob_heads / (1.0 - prob_heads);

rh = 1.0;
for (i = median; ; ) {
if (i <= heads_max) {
rh_tot += rh;
}
if (i <= 0)
break;
i--;
rh *= (i + 1.0) / (flips - i) / hr;
}

rh = 1.0;
for (i = median + 1; i <= heads_max; i++) {
rh *= hr * (flips - i + 1.0) / i;
rh_tot += rh;
}
return rh_tot;
}

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/

Prev by Date: Re: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)
Next by Date: RE: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces, newest data)
Previous by thread: [computer-go] Statistical Significance (was: SlugGo v.s. Many Faces, newest data)
Next by thread: Re: [computer-go] Statistical Significance (was: SlugGo v.s. ManyFaces, new...
Index(es):
- Date
- Thread