[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] Statistical Significance
> What was the consensus on whether to use win,lose,draw as the outcome
> of the game or to use relative territory to estimate the strength of a
> Go program?
I don't know if there has been a consensus on this, but the answer is
quite clear, since you specified "estimating the strength of a Go
program." The answer is that it only makes sense to go by the win,
lose, draw result.
That's because win, lose and draw is the only thing that actually
counts, it would not make sense to judge the WINNING program in a
series of matches as the WEAKER program because it won less territory
over the sequence of games. You can win 10-1 and still lose on total
territory if the 10 games are close and the one loss is a slaughter.
One method returns the correct results and the other method cannot
guarantee this.
In chess, beginners often judge the strength of players based on how
many moves it takes them to win. Any good chess player knows this is
nonsense. I view how much territory you win by as being in the same
general category of foolishness. It also assumes that you don't take
chances when you are losing and that you don't play a bit safer when
you are winning.
It turns out that this is an important question for my program. My
program increased in strength significantly when I changed the scoring
function to be based on winning probability instead of expected
territory.
I was quite surprised until I saw what was happening. The improved
program could not care less how much it wins by, as long as it wins.
It will give away a lot of territory to guarantee the win even if the
extra territory is at a minimal risk of losing. It will also give
away a chance to make a losing games pretty close if it has a slight
chance to win by playing something risky and not likely to succeed
even if the risky play is likely to lose big. But the point is that
it will at least try to win instead of accepting a sure (if close)
loss.
I have a switch to turn on the territory scoring mode in my program.
I hate the stronger mode because once it has won the game it plays
what appears to be awful moves, simply because it doesn't matter. I
can't run tactical problems in the stronger mode because it doesn't
care if it wins some group or not unless it needs this group to win
the game. So I have to run tactical tests in the weaker playing mode.
I have also tried integrating the two modes to get the best of both
worlds, but each attempt so far has resulted in a weaker program.
It would be unreasonable for me to judge the playing strength of my
program based on territory scores. If I played a series of matches
between the two versions of my own program, the weaker program would
clearly win this kind of match, even though the stronger version would
win over 2/3 of the games.
Because of my own results, I stronger recommend that if you have a
program that estimates territory as the evaluation function, you
should try to move in the direction of calculating winning
probabilities instead. I admit that it is much more natural and easy
to count (or estimate) territory, and it might not be easy to convert
to a probability of winning type of evaluation function. This depends
on the kind of program you have written.
Having said all of that, I can see the possiblity that for certain
programs the correlation may be very high between total territory won
over a series of games and actual results. In such cases, it may
require less games to guess-timate the relative strength of 2 or more
programs. However, since this is not really "correct", it would not
lead to very satisfying conclusions in my opinion.
- Don
X-Original-To: computer-go@xxxxxxxxxxxxxxxxx
X-Authentication-Warning: home.sweet.home: myriam set sender to
mabramso@xxxxxxxxxxxxxxxxx using -f
From: Myriam Abramson <mabramso@xxxxxxxxxxxxxxxxx>
Date: Sat, 18 Sep 2004 12:08:35 -0400
Reply-To: computer-go <computer-go@xxxxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxxxx
X-Scanned-By: MIMEDefang 2.42
Hi!
What was the consensus on whether to use win,lose,draw as the outcome
of the game or to use relative territory to estimate the strength of a
Go program?
myriam
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/