[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
On Wed, 8 Sep 2004, chrilly wrote:
> Testing if one version is better than another one is in my opinion the most
> critical part of games programming. If one can not measure progress
> properly, implementing new ideas is pointless.
> In Hydra we have a 3-stage process. First I do the developement. I use
> test-positions and games with Shredder and Fritz. But my judgement is purely
> according my feeling. No statistics involved. In the next stage 70 games
> under fixed conditions (35 opening-positions) are played. If the new version
> is considerable worse (30 Elo) than the best so far, it is discarded. If
> not, we give the version free for the sponsor (a Sheikh, who likes to play
> with Hydra). The sponsor plays than on the ChessBase-Server. He sends us the
> lost games. From the number of lost games and especially from the kind of
> erros we finally decide, if we continue development from this version or if
> we make a step back to the previous one.
So what do you do with small changes, of which you know pretty much in
advance that they will only affect the strength by, say, +- 10 Elo? I.e.
you know in advance that the 70 games match won't give you any
information.
Arend
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/