[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] SlugGo vs Many Faces, newest data
Don, at 22:16 08/09/2004, you wrote:
> Don, I think you're observations about new ideas giving odd results at
> first, must be due to some psychological factor. Perhaps you tend to
> remember the suprising results, or perhaps you test a lot of ideas that
> don't lead to improved performance, but quickly dismiss those that don't
> get off to a good start in the tests.
You are of course implying that I have a gamblers mentality. I think
you may not be a very good psychologist :-)
I don't mean to imply that at all, but you have made a very accurate
assessment of my psychological abilities ;). A lot of statistical results
are counter intuitive, and most untrained people don't do well designed
experiments. I was speculating that you might be one of those, however the
following description does suggest that your procedure is good.
Here is how I do my
testing. Before I start a test I determine in advance what my
stopping rule is and what result I much achieve to keep a change. The
stopping rule is usually a fixed number of round robin matches between
2 or more version of the program. I never stop early unless I find a
bug while testing in which case I restart after fixing.
> ... or perhaps you test a lot of ideas that don't lead to improved
> performance, ...
Why should I be exempt? This is what engineering is all about.
I think your editing of my sentence is a bit unfortunate here: you chopped
out the important part. That, and your comments make it look as though I
was belittling your coding, which I wasn't. The better the performance of
a programme, the harder it will be to improve it in general.
You have ruled out most of my ideas for why you could say
'So if I do something to my program and it tests 9-1, for instance, I just
laugh.' If you haven't
succeeded in improving your code, then the probability of it testing 9-1 or
10-0 is at most 11/1024, which is about 1/93. Of course this does not say
that with these results, the probability that you have improved the
programme is 92/93, but to laugh at such a result, I would think you would
estimate a prior of maybe only a 1/200 chance of improvement. Could this
be the explanation? Is my psychological ability letting me down when I
interpret your laughter? Maybe I am misinterpreting something here.
Re-reading your post, I think perhaps that you are aware that the rules of
probability say that a 9-1 result for an unimproved programme is very
unlikely, but that your observations show that probability is wrong in this
case. If this is what you mean, then I absolutely disagree with you. The
rules of probability are not wrong.
Doug. It is the one sided test that is relevant to these discussions. We
are interested in testing things like 'Is slug go at least 2 stones
stronger than MF?' rather than 'Is one or the other stonger in a two stone
game', which is much less interesting and almost certainly true.
Best wishes
Tom.
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.548 / Virus Database: 341 - Release Date: 05/12/2003
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/