[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
Small changes like this of course are very difficult to measure. That's
why some changes have to be based on judgement.
One thing I used to do in my chess programs is to track a set of small
changes together, little changes I felt helped the program but couldn't
measure. Of course I also tested each separately with problems that excercised the
change but mainly for debugging.
I'm sure it's the same in Go, but certain changes are very specific to
the kinds of positions you may only get 1 out of 20 games. Even if
the position occurs in 1 out of 20 games, there is no guarantee the
change actually converted a loss to a draw or win so even if it's a
good thing it's almost immeasurable.
X-Original-To: computer-go@xxxxxxxxxxxxxxxxx
Date: Thu, 9 Sep 2004 00:16:05 +0200 (CEST)
From: Arend Bayer <arend.bayer@xxxxxxxxxxxxxxxxx>
X-X-Sender: arend@xxxxxxxxxxxxxxxxx
X-Sender: arend.bayer@xxxxxxxxxxxxxxxxx
Cc:
Reply-To: computer-go <computer-go@xxxxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxxxx
On Wed, 8 Sep 2004, chrilly wrote:
> Testing if one version is better than another one is in my opinion the most
> critical part of games programming. If one can not measure progress
> properly, implementing new ideas is pointless.
> In Hydra we have a 3-stage process. First I do the developement. I use
> test-positions and games with Shredder and Fritz. But my judgement is purely
> according my feeling. No statistics involved. In the next stage 70 games
> under fixed conditions (35 opening-positions) are played. If the new version
> is considerable worse (30 Elo) than the best so far, it is discarded. If
> not, we give the version free for the sponsor (a Sheikh, who likes to play
> with Hydra). The sponsor plays than on the ChessBase-Server. He sends us the
> lost games. From the number of lost games and especially from the kind of
> erros we finally decide, if we continue development from this version or if
> we make a step back to the previous one.
So what do you do with small changes, of which you know pretty much in
advance that they will only affect the strength by, say, +- 10 Elo? I.e.
you know in advance that the 70 games match won't give you any
information.
Arend
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/