[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)

To: drd@xxxxxxx,computer-go <computer-go@xxxxxxxxxxxxxxx>
Subject: Re: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
From: Stuart A Yeates <s.yeates@xxxxxxxxxxxxxxxx>
Date: Thu, 09 Sep 2004 16:08:39 +0100
Cc:
Delivered-to: computer-go@xxxxxxxxxxxxxxxxx
In-reply-to: <200409091443.i89EhRiL020637@xxxxxxxxxxxxxxxxx>
List-archive: <http://computer-go.org/pipermail/computer-go>
List-help: <mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=help>
List-id: computer-go <computer-go.computer-go.org>
List-post: <mailto:computer-go@xxxxxxxxxxxxxxxxx>
List-subscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=subscribe>
List-unsubscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=unsubscribe>
References: <Pine.LNX.4.44.0409091151310.1030-100000@xxxxxxxxxxxxxxxxx><200409091443.i89EhRiL020637@xxxxxxxxxxxxxxxxx>
Reply-to: computer-go <computer-go@xxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxx
User-agent: Mozilla Thunderbird 0.5 (X11/20040306)

Don Dailey wrote:

Yes, go programs are randomized, but they do tend to repeat the same
patterns from game to game. This means you need to play a lot more
games than the independence-based statistical tests would suggest to
be really confident about which is better.



Which of  course is  a problem.  My  program is extremely  unlikely to
repeat a  position even after the  first 4 moves (two  for each side.)
But like you  imply, that doesn't mean the same  idea's and themes are
not being repeated.  To attempt to give the program new things to chew
on, I play  the first few moves randomly.  I avoid  moves to the edges
when doing this in an attempt to avoid some of the weakest moves.

The  problem with this  is that  it tends  to equalize  opponents.  If
version  B is  stronger than  version A,  giving them  random starting
positions  will tend  to  give a  little  advantage to  the version  A
because now  the version  A will tend  to get some  starting positions
that are  heavily in his favor.   In practice I'm not  sure this hurts
the testing very much but is  an issue.  I think this becomes a bigger
concern as  my program gets stronger.   I don't use  this technique if
I'm testing against a foreign program in order to measure progress.

I  considered making  a database  of  a few  thousand random  starting
positions and use only those, keeping statistics to determine which of
those positions  are grossly  unfair and culling  them out.  I  may do
this eventually.

There would seem to be to be an even better way of solving this problem:

Select positions random N steps into an opening book, allowing the database to reflect "real" playing. The opening book could be a standard one built from the games of high-level players or selected from games of players approximately the same strength as the go playing program.

cheers
stuart
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/

Follow-Ups:
- Re: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
  - From: Don Dailey
- RE: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
  - From: Anders Kierulf

References:
- [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
  - From: Nicol N. Schraudolph
- Re: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
  - From: Don Dailey

Prev by Date: Re: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
Next by Date: RE: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
Previous by thread: Re: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
Next by thread: RE: [computer-go] Statistical significance (was: SlugGo vs Many Faces,newest data)
Index(es):
- Date
- Thread