[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Just for fun

To: christian.nilsson@xxxxxxxxx,computer-go@xxxxxxxxxxxxxxx
Subject: Re: [computer-go] Just for fun
From: Don Dailey <drd@xxxxxxx>
Date: Mon, 14 Mar 2005 09:24:09 -0500
Cc: computer-go@xxxxxxxxxxxxxxx
Delivered-to: computer-go@xxxxxxxxxxxxxxxxx
In-reply-to: <c987018c05031402273975366a@xxxxxxxxxxxxxxxxx> (message fromChristian Nilsson on Mon, 14 Mar 2005 11:27:07 +0100)
List-archive: <http://computer-go.org/pipermail/computer-go>
List-help: <mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=help>
List-id: computer-go <computer-go.computer-go.org>
List-post: <mailto:computer-go@xxxxxxxxxxxxxxxxx>
List-subscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=subscribe>
List-unsubscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=unsubscribe>
References: <200503110240.j2B2eUal005968@xxxxxxxxxxxxxxxxx><c987018c05031402273975366a@xxxxxxxxxxxxxxxxx>
Reply-to: drd@xxxxxxx,computer-go <computer-go@xxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxx

Hi Christian,

How did you identify the players without '?' in their ratings?  Is
there an easy way to do this or did you simply look them all up
manually?    I would like to be able to separate these too.

If you did this manually, it is a much appreciate effort, must have
been a bit of work.

I am also interested in what rating formula, methodology, etc you
used.  I am thinking about using the ELO rating system to do something
similar and just have (basically) a conversion table to go back and
forth.  

I know these are pretty rough estimates, but botnoid has played 10
games against tlsbottest and won 9 of them, and 37 games against go81
and won 27 of them.  Your rating guesstimate would imply that botnoid
is quite a bit weaker that go81 which isn't supported by the 37 games.
Also, botnoid's 9 out of 10 wins could be a fluke against tlsbottest,
but now we have 47 games combined that indicate botnoid should be
rated quite a bit stronger.

I have several possible explanations:

  1.  botnoid really is weaker than go81 and the 47 games do not
      represent strength against human players.

  2.  Go81 has improved significantly since our 37 games (I know
      Tapani Raiko has been working on it.)

  3.  As you suggest,  the methodology is too crude.   I noticed 
      that a lot of undo's are happening in the games which kind
      of makes this kind of testing pretty crude.    

  4.  And of course it's possible that botnoid was statistically very
      fortunate in the 47 games against tls and go81 and is
      signifcantly weaker than these games would suggest.

Does your rating system take the final territory into account?  If so,
botnoid has a problem in that regard.  Botnoid is not interested in
the final territory score, it is happy to win by 1 stone and will not
fight beyond this.

I consider this a problem, because it stops fighting once it is sure
to win or lose.  Botnoid is monte-carlo based and thinks of the game
statistically.  It will let you have a lot of territory without a
fight if it considers this not relevant to its winning chances and
this applies whether it is clearly winning or clearly losing.   It's
not greedy in this regard.

Believe it or not, I haven't been able to fix this without weakening
the program.  Several seemingly obvious fixes have proven futile.
Botnoid cannot play a reasonable game without knowing what komi is in
advance.

I have a mode where it tries to maximize territory instead of trying
to win,  and it makes it feel like a stronger program,  but in fact
it is weaker!   A lot of testing has proven this.

I sometimes wonder if maximizing winning chances the way botnoid does
is a bad idea against humans of botnoids strength who might possibly
get distracted by the unimportant battles and lose elsewhere.   

- Don

   From: Christian Nilsson <christian.nilsson@xxxxxxxxxxxxxxxxx>
   Reply-To: Christian Nilsson <christian.nilsson@xxxxxxxxxxxxxxxxx>,
	   computer-go <computer-go@xxxxxxxxxxxxxxxxx>
   Sender: computer-go-bounces@xxxxxxxxxxxxxxxxx
   X-Spam-Score: -4.9
   X-Spam-Flag: NO
   X-Scanned-By: MIMEDefang 2.42

   I've written a small program to estimate the rank of players based on
   their 9x9 game records in the kgs archive. Here's a few results based
   on games with < 2 stones hcp against players without '?'.

   Handle  	Games	Rank
   -----------------------------
   gnugo3pt6	108	9.35k
   go81    	241	17.7k
   viking4 	90	17.9k
   viking5 	431	18.3k
   botnoid 	115	19.3k
   tlsbottest	38	19.9k
   dumbbot		229	27.7k
   nio		18	29.9k

   Remember that these are only estimates. It does not account for undos
   or incorrect scoring. I repeat once again that these are ESTIMATES. ;)

   /Christian

   On Thu, 10 Mar 2005 21:40:30 -0500, Don Dailey <drd@xxxxxxxxxxxxxxxxx> wrote:
   > 
   > I wanted to estimate the strength of my program as it plays on KGS,
   > but KGS doesn't do rated games on 9x9 boards.  However I can get a
   > rough estimate by processing the sgf files that KGS provides for my
   > player.
   > 
   > So I threw out all the game results from unrated players and I get the
   > following table:
   > 
   >   Rank   Win ratio      num games
   > -------  ---------     ----------
   >    3k      0.00000       1 games
   >    4k      0.00000       1 games
   >   10k      0.00000       1 games
   >   11k      0.00000       2 games
   >   12k      0.25000       4 games
   >   13k      1.00000       2 games
   >   14k      0.00000       1 games
   >   15k      0.50000       2 games
   >   17k      0.50000       2 games
   >   18k      0.66667       3 games
   >   19k      1.00000       1 games
   >   21k      1.00000       1 games
   >   22k      1.00000       1 games
   >   24k      1.00000       1 games
   >   25k      1.00000       2 games
   >   26k      1.00000       3 games
   >   28k      0.50000       2 games
   > 
   > Not much data, but the chart would indicate that my program is
   > competing well with players over 16k, and not so well with players
   > under 16k.  That would imply something near 16k.
   > 
   > The sgf files KGS produces have ratings even if the player just
   > estimated the ratings themselves, so I take this with a grain of salt.
   > Also, the games are not played seriously by players when they are
   > unrated.  (I watched one player easily beat my program but lose on
   > purpose, probably just to study my program and see what it would do.)
   > 
   > So all things considered, I am revising my estimate to be around 18k.
   > I will know more when a lot more games have been played.
   > 
   > - Don
   > 
   > _______________________________________________
   > computer-go mailing list
   > computer-go@xxxxxxxxxxxxxxxxx
   > http://www.computer-go.org/mailman/listinfo/computer-go/
   >
   _______________________________________________
   computer-go mailing list
   computer-go@xxxxxxxxxxxxxxxxx
   http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/

Follow-Ups:
- Re: [computer-go] Just for fun
  - From: Tapani Raiko

References:
- [computer-go] Just for fun
  - From: Don Dailey
- Re: [computer-go] Just for fun
  - From: Christian Nilsson

Prev by Date: Re: [computer-go] Just for fun
Next by Date: Re: [computer-go] Just for fun
Previous by thread: Re: [computer-go] Just for fun
Next by thread: Re: [computer-go] Just for fun
Index(es):
- Date
- Thread