[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)



Small changes like this of course are very difficult to measure.  That's
why some changes have to be based on judgement.   

One thing I used to do in my chess programs is to track a set of small
changes together,  little changes I felt helped the program but couldn't
measure.  Of course I also tested each separately with problems that excercised the
change but mainly for debugging.    

I'm sure it's the same in Go, but certain changes are very specific to
the kinds of  positions you may only  get 1 out of 20  games.  Even if
the position  occurs in 1 out of  20 games, there is  no guarantee the
change actually converted  a loss to a  draw or win so even  if it's a
good thing it's almost immeasurable.




   X-Original-To: computer-go@xxxxxxxxxxxxxxxxx
   Date: Thu, 9 Sep 2004 00:16:05 +0200 (CEST)
   From: Arend Bayer <arend.bayer@xxxxxxxxxxxxxxxxx>
   X-X-Sender: arend@xxxxxxxxxxxxxxxxx
   X-Sender: arend.bayer@xxxxxxxxxxxxxxxxx
   Cc: 
   Reply-To: computer-go <computer-go@xxxxxxxxxxxxxxxxx>
   Sender: computer-go-bounces@xxxxxxxxxxxxxxxxx


   On Wed, 8 Sep 2004, chrilly wrote:

   > Testing if one version is better than another one is in my opinion the most
   > critical part of games programming. If one can not measure progress
   > properly, implementing new ideas is pointless.

   > In Hydra we have a 3-stage process. First I do the developement. I use
   > test-positions and games with Shredder and Fritz. But my judgement is purely
   > according my feeling. No statistics involved. In the next stage 70 games
   > under fixed conditions (35 opening-positions) are played. If the new version
   > is considerable worse (30 Elo) than the best so far, it is discarded. If
   > not, we give the version free for the sponsor (a Sheikh, who likes to play
   > with Hydra). The sponsor plays than on the ChessBase-Server. He sends us the
   > lost games. From the number of lost games and especially from the kind of
   > erros we finally decide, if we continue development from this version or if
   > we make a step back to the previous one.

   So what do you do with small changes, of which you know pretty much in
   advance that they will only affect the strength by, say, +- 10 Elo? I.e.
   you know in advance that the 70 games match won't give you any
   information.

   Arend


   _______________________________________________
   computer-go mailing list
   computer-go@xxxxxxxxxxxxxxxxx
   http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/