[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] Statistical Significance
Mike,
It sounds like you are talking about two different things, but I would
like to comment on the rating systems.
I use ELO when I test my program with games. It's not a perfect
system but I believe is better or just as good as anything else ever
proposed.
There can never be a perfect rating system. Even ELO makes an
assumption that is not strictly true and the whole mathmatically basis
of it is based on this assumption. The assumption is that playing
ability is a constant thing that can be measured by a single number.
In other words it assumes that if Frank can beat Joe, then Frank can
also beat anyone that Joe can beat. In the real world, this is not
completely true, even though in practice it is "mostly" true. It is
true enough that you can build an excellent rating system out of it.
Often, I will test several version of my program. I prefer to test
several versions at a time if it can be conveniently arranged instead
of testing only 2 at a time because I feel that it at least attempts
to minimize the "intransitive" relationship. At least I can look at
the results and look for the "Frank beats Joe" intransitivity if it is
significant in the data. I try to throw in foreign programs or at
least other version of my own program that vary signficantly in
playing heuristics.
When I do this kind of testing, I use ELO for rating all the
individuals. Without some kind of rating system I can't easily
quantify the results. If the match is a strict multi round robin
test, you can also look at the total number of won games and rank
accordingly, but with ELO you can test in any combination and get more
or less accurate results. You can even say which program is stronger
without any games having been played between them, as long as there is
a chain of relationships linking the two programs.
With ELO you can "predict" (at least mathmatically) what kind of
results you might expect to achieve against any other ELO rated
player, even if you have never played them before. In fact, it
shouldn't matter if you've ever played them before. As we mentioned
this isn't perfect, but it works pretty good for the most part.
- Don
X-Original-To: computer-go@xxxxxxxxxxxxxxxxx
From: Michael Gherrity <mike@xxxxxxxxxxxxxxxxx>
Date: Thu, 23 Sep 2004 02:39:08 -0700
X-Virus-Scanned: Symantec AntiVirus Scan Engine
Cc:
Reply-To: computer-go <computer-go@xxxxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxxxx
X-Scanned-By: MIMEDefang 2.42
I was wondering why the ELO or more modern Glicko
<http://math.bu.edu/people/mg/ratings.html> rating system used for
chess would not be appropriate for this task?
mike
On Sep 18, 2004, at 10:43 AM, David G Doshay wrote:
>
> On Sep 18, 2004, at 9:08 AM, Myriam Abramson wrote:
>
>> What was the consensus on whether to use win,lose,draw as the outcome
>> of the game or to use relative territory to estimate the strength of a
>> Go program?
>
> My observation is that there were arguments for both approaches
> and no final consensus.
>
> Cheers,
> David
--
Michael Gherrity
mike@xxxxxxxxxxxxxxxxx
http://www.gherrity.org
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/