[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] Just for fun
It would be nice to think Dumbbot is somewhere near your estimated 27k,
but I don't believe it is.
I think you may have included those games where the bot apparently wins
but, in reality, the human opponent got bored and resigned to escape.
This has happened to Dumbbot quite a bit and accounts for it's 27k? rating
on KGS.
I'd suggest only including those games where the human plays it out fully.
Dumbbot resigns at the moment if it gets too far behind in captures, so
that should still count as a loss.
HTH - interesting results - thanks for crunching those numbers!
John
>
>> Hi!
>>
>> Go81 bot is the development version so I have had a buggy version
>> running but not against Botnoid (I think). That could have had
>> explained things.
>
>> Instead, I believe it is more about programs playing essentially
>> same every time. Go81 vs. Botnoid or tlsbot very often leads to Go81
>> making an invasion.
>
> I'm not sure this explains why Go81 looses to botnoid.
>
> I looked at tls vs Go81 and found that the score is about even with
> tlsbot actually having a slight edge.
>
> So Go81 loses to BOTH programs, but gets a much higher estimated
> rating!
>
>
>> I would expect a lot of things like: program A is better than B, B is
>> better than C, and C is better than A.
>
> Yes, this is called "intransitivity."
>
> Of course we all seem to agree the estimated rating methodology is
> rather flawed, but the intransivity seems to indicate that botnoid is
> very good against tlsbot and Go81 and that Go81 is very good (relative
> to the tslbot and botnoid) against humans.
>
> I noticed that some players will play a long series of games against
> one of the bots. This can also skew the ratings very signficantly if
> that player has significantly overestimated or underestimated his
> rating.
>
> A way to deal with this is to limit the sample of any single player. If
> one player plays 100 games and wins 60% of them, it could be
> treated as a single game that was 60% won for rating purposes. The
> idea being that it's better to sample over many opponents instead of
> many games.
>
>
> - Don
>
> _______________________________________________
> computer-go mailing list
> computer-go@xxxxxxxxxxxxxxxxx
> http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/