[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Statistical Significance

To: computer-go@xxxxxxxxxxxxxxx
Subject: Re: [computer-go] Statistical Significance
From: Don Dailey <drd@xxxxxxx>
Date: Thu, 23 Sep 2004 13:06:34 -0400
Cc: computer-go@xxxxxxxxxxxxxxx
Delivered-to: computer-go@xxxxxxxxxxxxxxxxx
In-reply-to: <6A9B1026-0D44-11D9-BD17-0003936729BC@xxxxxxxxxxxxxxxxx> (message fromMichael Gherrity on Thu, 23 Sep 2004 02:39:08 -0700)
List-archive: <http://computer-go.org/pipermail/computer-go>
List-help: <mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=help>
List-id: computer-go <computer-go.computer-go.org>
List-post: <mailto:computer-go@xxxxxxxxxxxxxxxxx>
List-subscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=subscribe>
List-unsubscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=unsubscribe>
References: <Pine.LNX.4.33.0409062105190.6381-100000@xxxxxxxxxxxxxxxxx><DB27DCE0-0087-11D9-9F42-000393753538@xxxxxxxxxxxxxxxxx><m3acvnv9to.fsf@xxxxxxxxxxxxxxxxx><346DBCA2-099A-11D9-9679-000393753538@xxxxxxxxxxxxxxxxx><6A9B1026-0D44-11D9-BD17-0003936729BC@xxxxxxxxxxxxxxxxx>
Reply-to: drd@xxxxxxx,computer-go <computer-go@xxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxx

Mike,

It sounds like you are talking about two different things, but I would
like to comment on the rating systems.

I  use ELO when  I test  my program  with games.   It's not  a perfect
system but I  believe is better or just as good  as anything else ever
proposed.

There  can  never be  a  perfect rating  system.   Even  ELO makes  an
assumption that is not strictly true and the whole mathmatically basis
of it  is based  on this assumption.   The assumption is  that playing
ability is a  constant thing that can be measured  by a single number.
In other words  it assumes that if Frank can beat  Joe, then Frank can
also beat  anyone that Joe can beat.   In the real world,  this is not
completely true, even  though in practice it is  "mostly" true.  It is
true enough that you can build an excellent rating system out of it.

Often, I  will test several version  of my program.  I  prefer to test
several versions at a time  if it can be conveniently arranged instead
of testing only 2  at a time because I feel that  it at least attempts
to minimize the  "intransitive" relationship.  At least I  can look at
the results and look for the "Frank beats Joe" intransitivity if it is
significant in  the data.  I  try to throw  in foreign programs  or at
least  other version  of  my  own program  that  vary signficantly  in
playing heuristics.

When  I  do this  kind  of  testing, I  use  ELO  for  rating all  the
individuals.   Without  some kind  of  rating  system  I can't  easily
quantify  the results.  If  the match  is a  strict multi  round robin
test, you  can also  look at the  total number  of won games  and rank
accordingly, but with ELO you can test in any combination and get more
or less accurate results.  You  can even say which program is stronger
without any games having been played between them, as long as there is
a chain of relationships linking the two programs.

With  ELO you  can "predict"  (at  least mathmatically)  what kind  of
results  you might  expect  to  achieve against  any  other ELO  rated
player,  even if  you  have never  played  them before.   In fact,  it
shouldn't matter if  you've ever played them before.   As we mentioned
this isn't perfect, but it works pretty good for the most part.

- Don

   X-Original-To: computer-go@xxxxxxxxxxxxxxxxx
   From: Michael Gherrity <mike@xxxxxxxxxxxxxxxxx>
   Date: Thu, 23 Sep 2004 02:39:08 -0700
   X-Virus-Scanned: Symantec AntiVirus Scan Engine
   Cc: 
   Reply-To: computer-go <computer-go@xxxxxxxxxxxxxxxxx>
   Sender: computer-go-bounces@xxxxxxxxxxxxxxxxx
   X-Scanned-By: MIMEDefang 2.42

   I was wondering why the ELO or more modern Glicko 
   <http://math.bu.edu/people/mg/ratings.html> rating system used for 
   chess would not be appropriate for this task?

	    mike

   On Sep 18, 2004, at 10:43 AM, David G Doshay wrote:

   >
   > On Sep 18, 2004, at 9:08 AM, Myriam Abramson wrote:
   >
   >> What was the consensus on whether to use win,lose,draw as the outcome
   >> of the game or to use relative territory to estimate the strength of a
   >> Go program?
   >
   > My observation is that there were arguments for both approaches
   > and no final consensus.
   >
   > Cheers,
   > David
   --
   Michael Gherrity
   mike@xxxxxxxxxxxxxxxxx
   http://www.gherrity.org

   _______________________________________________
   computer-go mailing list
   computer-go@xxxxxxxxxxxxxxxxx
   http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/

References:
- [computer-go] Statistical Significance (was: SlugGo v.s. Many Faces, newest data)
  - From: David G Doshay
- Re: [computer-go] Statistical Significance
  - From: Myriam Abramson
- Re: [computer-go] Statistical Significance
  - From: David G Doshay
- Re: [computer-go] Statistical Significance
  - From: Michael Gherrity

Prev by Date: Re: Re: [computer-go] Statistical Significance
Next by Date: Re: [computer-go] Statistical Significance
Previous by thread: Re: [computer-go] Statistical Significance
Next by thread: Re: [computer-go] Statistical Significance
Index(es):
- Date
- Thread