[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Statistical Significance (was: SlugGo v.s.ManyFaces,newest data)



Testing if one version is better than another one is in my opinion the most
critical part of games programming. If one can not measure progress
properly, implementing new ideas is pointless.
In Hydra we have a 3-stage process. First I do the developement. I use
test-positions and games with Shredder and Fritz. But my judgement is purely
according my feeling. No statistics involved. In the next stage 70 games
under fixed conditions (35 opening-positions) are played. If the new version
is considerable worse (30 Elo) than the best so far, it is discarded. If
not, we give the version free for the sponsor (a Sheikh, who likes to play
with Hydra). The sponsor plays than on the ChessBase-Server. He sends us the
lost games. From the number of lost games and especially from the kind of
erros we finally decide, if we continue development from this version or if
we make a step back to the previous one.
But sometimes it is easy. E.g. the last version which beated Shredder in a
match 5.5:2.5 was clearly better than previous ones. I decided already
before the 70 games that we will play the Shredder-match with this version.
The feeling was later confirmed by the test-procedures. The version was 50
Elo better. For such a big step my feeling is good enough.

Although I have a PhD in statistics, the process is not really based on
statistics. It was developed by members of the team who have no knowledge of
statistics. But it works reasonable. There is always the trade-off between
statistical strictness and practical goals. One would have to play 1000
games to get a result which is accurate by 10 Elo. But this would halt the
development process.

In case of SluGo I have no doubt that it is better. Although the
parallelization is rather primitive and inefficient, it is clear for me,
that additional search is usefull. The question is only, how great is the
improvement. Searching 1 Ply deeper in chess (speeding up the programm by a
factor 4-5)  improves a programm by about 200 Elo. At least this was the
value in the past. For the current search depths and the playing strength of
the programs the increment is less. It is more difficult to improve from
2800 (Human-World-Champion) to 3000 Elo than from 2000 to 2200.

Chrilly Donninger

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/