[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)

To: computer-go <computer-go@xxxxxxxxxxxxxxx>
Subject: Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
From: Arend Bayer <arend.bayer@xxxxxx>
Date: Thu, 9 Sep 2004 00:16:05 +0200 (CEST)
Cc:
Delivered-to: computer-go@xxxxxxxxxxxxxxxxx
In-reply-to: <004501c4957f$b9fbf920$0301010a@modem>
List-archive: <http://computer-go.org/pipermail/computer-go>
List-help: <mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=help>
List-id: computer-go <computer-go.computer-go.org>
List-post: <mailto:computer-go@xxxxxxxxxxxxxxxxx>
List-subscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=subscribe>
List-unsubscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=unsubscribe>
References: <004501c4957f$b9fbf920$0301010a@modem>
Reply-to: computer-go <computer-go@xxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxx

On Wed, 8 Sep 2004, chrilly wrote:

> Testing if one version is better than another one is in my opinion the most
> critical part of games programming. If one can not measure progress
> properly, implementing new ideas is pointless.

> In Hydra we have a 3-stage process. First I do the developement. I use
> test-positions and games with Shredder and Fritz. But my judgement is purely
> according my feeling. No statistics involved. In the next stage 70 games
> under fixed conditions (35 opening-positions) are played. If the new version
> is considerable worse (30 Elo) than the best so far, it is discarded. If
> not, we give the version free for the sponsor (a Sheikh, who likes to play
> with Hydra). The sponsor plays than on the ChessBase-Server. He sends us the
> lost games. From the number of lost games and especially from the kind of
> erros we finally decide, if we continue development from this version or if
> we make a step back to the previous one.

So what do you do with small changes, of which you know pretty much in
advance that they will only affect the strength by, say, +- 10 Elo? I.e.
you know in advance that the 70 games match won't give you any
information.

Arend


_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/

Follow-Ups:
- Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
  - From: Don Dailey

References:
- Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
  - From: chrilly

Prev by Date: Fw: Re: Fw: Re: [computer-go] Statistical Significance (was: SlugGov.s. Many Fac...
Next by Date: Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
Previous by thread: Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
Next by thread: Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)
Index(es):
- Date
- Thread