[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)

To: computer-go@xxxxxxxxxxxxxxx
Subject: Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
From: Don Dailey <drd@xxxxxxx>
Date: Thu, 9 Sep 2004 08:10:42 -0400
Cc: computer-go@xxxxxxxxxxxxxxx
Delivered-to: computer-go@xxxxxxxxxxxxxxxxx
In-reply-to: <Pine.LNX.4.58.0409090013370.15652@xxxxxxxxxxxxxxxxx> (message fromArend Bayer on Thu, 9 Sep 2004 00:16:05 +0200 (CEST))
List-archive: <http://computer-go.org/pipermail/computer-go>
List-help: <mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=help>
List-id: computer-go <computer-go.computer-go.org>
List-post: <mailto:computer-go@xxxxxxxxxxxxxxxxx>
List-subscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=subscribe>
List-unsubscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=unsubscribe>
References: <004501c4957f$b9fbf920$0301010a@modem><Pine.LNX.4.58.0409090013370.15652@xxxxxxxxxxxxxxxxx>
Reply-to: drd@xxxxxxx,computer-go <computer-go@xxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxx

Small changes like this of course are very difficult to measure.  That's
why some changes have to be based on judgement.   

One thing I used to do in my chess programs is to track a set of small
changes together,  little changes I felt helped the program but couldn't
measure.  Of course I also tested each separately with problems that excercised the
change but mainly for debugging.    

I'm sure it's the same in Go, but certain changes are very specific to
the kinds of  positions you may only  get 1 out of 20  games.  Even if
the position  occurs in 1 out of  20 games, there is  no guarantee the
change actually converted  a loss to a  draw or win so even  if it's a
good thing it's almost immeasurable.

   X-Original-To: computer-go@xxxxxxxxxxxxxxxxx
   Date: Thu, 9 Sep 2004 00:16:05 +0200 (CEST)
   From: Arend Bayer <arend.bayer@xxxxxxxxxxxxxxxxx>
   X-X-Sender: arend@xxxxxxxxxxxxxxxxx
   X-Sender: arend.bayer@xxxxxxxxxxxxxxxxx
   Cc: 
   Reply-To: computer-go <computer-go@xxxxxxxxxxxxxxxxx>
   Sender: computer-go-bounces@xxxxxxxxxxxxxxxxx

   On Wed, 8 Sep 2004, chrilly wrote:

   > Testing if one version is better than another one is in my opinion the most
   > critical part of games programming. If one can not measure progress
   > properly, implementing new ideas is pointless.

   > In Hydra we have a 3-stage process. First I do the developement. I use
   > test-positions and games with Shredder and Fritz. But my judgement is purely
   > according my feeling. No statistics involved. In the next stage 70 games
   > under fixed conditions (35 opening-positions) are played. If the new version
   > is considerable worse (30 Elo) than the best so far, it is discarded. If
   > not, we give the version free for the sponsor (a Sheikh, who likes to play
   > with Hydra). The sponsor plays than on the ChessBase-Server. He sends us the
   > lost games. From the number of lost games and especially from the kind of
   > erros we finally decide, if we continue development from this version or if
   > we make a step back to the previous one.

   So what do you do with small changes, of which you know pretty much in
   advance that they will only affect the strength by, say, +- 10 Elo? I.e.
   you know in advance that the 70 games match won't give you any
   information.

   Arend

   _______________________________________________
   computer-go mailing list
   computer-go@xxxxxxxxxxxxxxxxx
   http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/

References:
- Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
  - From: chrilly
- Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
  - From: Arend Bayer

Prev by Date: Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
Next by Date: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
Previous by thread: Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
Next by thread: [computer-go] Re: Statistical Significance (was: SlugGo v.s. ManyFaces, newest data)
Index(es):
- Date
- Thread