[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] Just for fun
Hi Christian,
How did you identify the players without '?' in their ratings? Is
there an easy way to do this or did you simply look them all up
manually? I would like to be able to separate these too.
If you did this manually, it is a much appreciate effort, must have
been a bit of work.
I am also interested in what rating formula, methodology, etc you
used. I am thinking about using the ELO rating system to do something
similar and just have (basically) a conversion table to go back and
forth.
I know these are pretty rough estimates, but botnoid has played 10
games against tlsbottest and won 9 of them, and 37 games against go81
and won 27 of them. Your rating guesstimate would imply that botnoid
is quite a bit weaker that go81 which isn't supported by the 37 games.
Also, botnoid's 9 out of 10 wins could be a fluke against tlsbottest,
but now we have 47 games combined that indicate botnoid should be
rated quite a bit stronger.
I have several possible explanations:
1. botnoid really is weaker than go81 and the 47 games do not
represent strength against human players.
2. Go81 has improved significantly since our 37 games (I know
Tapani Raiko has been working on it.)
3. As you suggest, the methodology is too crude. I noticed
that a lot of undo's are happening in the games which kind
of makes this kind of testing pretty crude.
4. And of course it's possible that botnoid was statistically very
fortunate in the 47 games against tls and go81 and is
signifcantly weaker than these games would suggest.
Does your rating system take the final territory into account? If so,
botnoid has a problem in that regard. Botnoid is not interested in
the final territory score, it is happy to win by 1 stone and will not
fight beyond this.
I consider this a problem, because it stops fighting once it is sure
to win or lose. Botnoid is monte-carlo based and thinks of the game
statistically. It will let you have a lot of territory without a
fight if it considers this not relevant to its winning chances and
this applies whether it is clearly winning or clearly losing. It's
not greedy in this regard.
Believe it or not, I haven't been able to fix this without weakening
the program. Several seemingly obvious fixes have proven futile.
Botnoid cannot play a reasonable game without knowing what komi is in
advance.
I have a mode where it tries to maximize territory instead of trying
to win, and it makes it feel like a stronger program, but in fact
it is weaker! A lot of testing has proven this.
I sometimes wonder if maximizing winning chances the way botnoid does
is a bad idea against humans of botnoids strength who might possibly
get distracted by the unimportant battles and lose elsewhere.
- Don
From: Christian Nilsson <christian.nilsson@xxxxxxxxxxxxxxxxx>
Reply-To: Christian Nilsson <christian.nilsson@xxxxxxxxxxxxxxxxx>,
computer-go <computer-go@xxxxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxxxx
X-Spam-Score: -4.9
X-Spam-Flag: NO
X-Scanned-By: MIMEDefang 2.42
I've written a small program to estimate the rank of players based on
their 9x9 game records in the kgs archive. Here's a few results based
on games with < 2 stones hcp against players without '?'.
Handle Games Rank
-----------------------------
gnugo3pt6 108 9.35k
go81 241 17.7k
viking4 90 17.9k
viking5 431 18.3k
botnoid 115 19.3k
tlsbottest 38 19.9k
dumbbot 229 27.7k
nio 18 29.9k
Remember that these are only estimates. It does not account for undos
or incorrect scoring. I repeat once again that these are ESTIMATES. ;)
/Christian
On Thu, 10 Mar 2005 21:40:30 -0500, Don Dailey <drd@xxxxxxxxxxxxxxxxx> wrote:
>
> I wanted to estimate the strength of my program as it plays on KGS,
> but KGS doesn't do rated games on 9x9 boards. However I can get a
> rough estimate by processing the sgf files that KGS provides for my
> player.
>
> So I threw out all the game results from unrated players and I get the
> following table:
>
> Rank Win ratio num games
> ------- --------- ----------
> 3k 0.00000 1 games
> 4k 0.00000 1 games
> 10k 0.00000 1 games
> 11k 0.00000 2 games
> 12k 0.25000 4 games
> 13k 1.00000 2 games
> 14k 0.00000 1 games
> 15k 0.50000 2 games
> 17k 0.50000 2 games
> 18k 0.66667 3 games
> 19k 1.00000 1 games
> 21k 1.00000 1 games
> 22k 1.00000 1 games
> 24k 1.00000 1 games
> 25k 1.00000 2 games
> 26k 1.00000 3 games
> 28k 0.50000 2 games
>
> Not much data, but the chart would indicate that my program is
> competing well with players over 16k, and not so well with players
> under 16k. That would imply something near 16k.
>
> The sgf files KGS produces have ratings even if the player just
> estimated the ratings themselves, so I take this with a grain of salt.
> Also, the games are not played seriously by players when they are
> unrated. (I watched one player easily beat my program but lose on
> purpose, probably just to study my program and see what it would do.)
>
> So all things considered, I am revising my estimate to be around 18k.
> I will know more when a lot more games have been played.
>
> - Don
>
> _______________________________________________
> computer-go mailing list
> computer-go@xxxxxxxxxxxxxxxxx
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/