[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Statistical Significance



>  What was the consensus on whether to use win,lose,draw as the outcome
>  of the game or to use relative territory to estimate the strength of a
>  Go program? 

I don't know if there has been  a consensus on this, but the answer is
quite  clear, since  you specified  "estimating the  strength of  a Go
program."  The  answer is that it only  makes sense to go  by the win,
lose, draw result.

That's  because win, lose  and draw  is the  only thing  that actually
counts, it  would not  make sense  to judge the  WINNING program  in a
series of matches as the  WEAKER program because it won less territory
over the sequence of games.  You  can win 10-1 and still lose on total
territory if the  10 games are close and the one  loss is a slaughter.
One method  returns the  correct results and  the other  method cannot
guarantee this.

In chess, beginners  often judge the strength of  players based on how
many moves it takes them to  win.  Any good chess player knows this is
nonsense.  I view  how much territory you win by as  being in the same
general category of foolishness.  It  also assumes that you don't take
chances when you  are losing and that you don't play  a bit safer when
you are winning.

It turns  out that this is  an important question for  my program.  My
program increased in strength significantly when I changed the scoring
function  to  be based  on  winning  probability  instead of  expected
territory.

I was  quite surprised until I  saw what was  happening.  The improved
program could not care  less how much it wins by, as  long as it wins.
It will give away a lot of  territory to guarantee the win even if the
extra territory  is at a  minimal risk of  losing.  It will  also give
away a chance to  make a losing games pretty close if  it has a slight
chance to  win by  playing something risky  and not likely  to succeed
even if the risky  play is likely to lose big.  But  the point is that
it will  at least try  to win instead  of accepting a sure  (if close)
loss.

I have a  switch to turn on the territory scoring  mode in my program.
I hate  the stronger mode  because once it  has won the game  it plays
what appears to  be awful moves, simply because  it doesn't matter.  I
can't run  tactical problems in  the stronger mode because  it doesn't
care if it  wins some group or  not unless it needs this  group to win
the game.  So I have to run tactical tests in the weaker playing mode.
I have  also tried integrating the two  modes to get the  best of both
worlds, but each attempt so far has resulted in a weaker program.

It would  be unreasonable for me  to judge the playing  strength of my
program based  on territory scores.  If  I played a  series of matches
between the two  versions of my own program,  the weaker program would
clearly win this kind of match, even though the stronger version would
win over 2/3 of the games.

Because of  my own results,  I stronger recommend  that if you  have a
program  that  estimates territory  as  the  evaluation function,  you
should  try   to  move  in   the  direction  of   calculating  winning
probabilities instead.  I admit that  it is much more natural and easy
to count (or estimate) territory, and  it might not be easy to convert
to a probability of winning type of evaluation function.  This depends
on the kind of program you have written.

Having said  all of that,  I can see  the possiblity that  for certain
programs the correlation may be  very high between total territory won
over a  series of  games and  actual results.  In  such cases,  it may
require less games to guess-timate  the relative strength of 2 or more
programs.  However, since  this is not really "correct",  it would not
lead to very satisfying conclusions in my opinion.


- Don





   X-Original-To: computer-go@xxxxxxxxxxxxxxxxx
   X-Authentication-Warning: home.sweet.home: myriam set sender to
	   mabramso@xxxxxxxxxxxxxxxxx using -f
   From: Myriam Abramson <mabramso@xxxxxxxxxxxxxxxxx>
   Date: Sat, 18 Sep 2004 12:08:35 -0400
   Reply-To: computer-go <computer-go@xxxxxxxxxxxxxxxxx>
   Sender: computer-go-bounces@xxxxxxxxxxxxxxxxx
   X-Scanned-By: MIMEDefang 2.42


   Hi!

   What was the consensus on whether to use win,lose,draw as the outcome
   of the game or to use relative territory to estimate the strength of a
   Go program? 


				      myriam

   _______________________________________________
   computer-go mailing list
   computer-go@xxxxxxxxxxxxxxxxx
   http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/