[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] Computer Olympiad photo
At 17:31 4-12-2003 -0500, Don Dailey wrote:
>
>> From the last 3 tournaments diep played, the DECISIVE errors not long after
>> diep had left book, they took place at depths of around 10-13 ply, when i
>> searched them at 24 hours a move, so getting depths of up to 20 ply, the
>> vaste majority of moves still were played. Only 1 of the moves another 2
>> ply was helping (i simply got too small of a search depth with my slow
>> knowledge oriented chessprogram which is not so selective like most top
>> programs are).
>>
>> So where overwhelming brute force power would fix 1 move,
>> all of the errors i could easily fix by some improvements
>> in evaluation function.
>
>
>This is a very misleading use of statistics. Let me point out the problem:
It isn't. It is showing reality.
> 1. Your program already has a reasonably well balanced evaluation
> function compared to most chess programs. That means it is hard
> to make improvements. Even if you hand coded some evaluation
> fixes to make the program play all the right moves where it
> failed, the actual improvement to the program would be very
> negligible.
It is true progress goes slow, but all those 'neglibible improvements' add
up and add up...
> Usually, changes to fix problems create other problems. That is
This is true when you modify just a parameter. There has been a shift
however in the world
top chess programs which i already had predicted years ago.
That is you can safely add code for certain cases without influencing other
parts of the game much.
So you *can* simply code new rating points without risk. This has been
proven by many different authors now.
> because chess evaluation is hard and we code up heurstics that
> are based on general principles, not hard facts. Once you have
That's why it helps that i'm a titled chess player, i can distinguish
between general principles and hard facts which are nearly always true.
It is however true that it is hard to make code that avoids general
principles which in some positions are completely false.
In this sense there is little difference between go and chess.
> achieved a reasonable balance you reach a point of diminishing
> returns where it's simply very hard to make even tiny
> improvements. Each heuristic interacts with other heuristics,
> and it's a big mess.
This is true when you add heuristics. If you add however knowledge it is
not true. Adding knowledge means coding new rating points!
> If you add new ideas to the evaluation function, you slow it down
> a little. It becomes very strong in dealing with the problems
If you are slower than the rest, just show up with more processors. Which
is what i did. DIEP showed up last world champs with 512 processors...
> you just fixed, but you have effectively weakened the program
> just a little in every other position because you have slowed
> down the search. I once made a number of very clear positional
As i pointed out, i prefer a year of bugfixing slowing down my engine 2
times versus 1 ply extra. The extra ply i'll get anyway in the end because
hardware gets faster and there is not much difference between 14 and 19 ply
search depth, whereas a better bugfixed and enhanced evaluation function
brings way more.
In fact there is hard proof for this by empirical data.
In 1999 many chessprograms reached 14-17 ply, and the years after that half
the commercial world top searched 2 ply less, because they had added
massively knowledge to their chess program.
Of course not many persons noticed this. The average scientist in this
field is doing a lot but not taking a look what the commercial guys are
doing and measuring search depth differences etc!
> improvements to my chess program and discovered that it was
> playing weaker. After a lot of study, I learned that the program
> was now 1/2 the speed of the previous version and was losing
> tactically. In reality all the improvements affected only a
> fraction of all the positions it was required to search.
Your program doesn't have any form of king safety, so it's trivial that you
seemingly get tricked tactical there, where it usually is positional.
> 2. It doesn't take a lot of different moves to make a program
> stronger. You mentioned 3 tournaments, but Diep didn't lose
> every game, it is well above average strength of tournament chess
> programs. So it probably won most of it's games. We are
I only lost from commercial programs of course. That means that it is
indeed 2 to 3 moves each tournament we are talking about. So i'm only
interested where i lost my points if you don't mind. From scientific
viewpoint this is also interesting, because how do you advance an already
strong product?
I fail to see any other computer chess researcher who is publicly writing
in newsgroups about this.
Try to find much relevant data here from the number 1 to 6 of the world
last 2 world championships. You won't find much useful. Yet i know they all
drew the same conclusion like i did. I am however th eonly one who speaks
out loud here.
So that is the sad truth indeed.
Yet if you just look to what all those guys did the past 3 years you'll see
that despite cpu's getting x times faster their engines do not get more
nodes a second.
This can have 2 reasons:
a) more forward pruning
b) more evaluation
In most cases we can soon conclude that it is only b that influences it and
in the other cases we can after some study of the different versiosn
conclude it is b.
DIEP didn't slow down last 3 years a factor 2.0, however several commercial
engines did.
Trivially that's nearly 1 ply, so what do they think more important?
> interested only in the few games it lost (or draw when a win was
> called for.) But an extra win out of a very few games will
> register as a substantial rating improvement. At any rate, this
> is a pathetically small sample to base conclusions on.
No, the level is that high and there is an overwhelming number of strong
engines that show the same habit, that it is very clear.
Now just do some effort and study those engines. Because this is the real
weakness.
You are busy concluding within your WWW browser and your own thoughts and
experiments from the past.
I'm daily confronted with all those engines and each year i play 5 to 7
tournaments where i join my engine in and get confronted with all those
engines and different programmers. I chat to them, eat with them and email
continuesly with them. At home i study their engines closely.
Do you?
If you do not, how can you deny any empirical data here?
If you do not take sample data, you of course never will see any evidence
here and deny things based upon statistics.
Yet if my engine in the past few tournaments (world champs 2003 not being
extensively researched yet) is making only evaluation mistakes and not
search mistakes, then the only undeniable data that you are confronted with
is just that.
Denying that as useful data is incorrect. It's high level games where there
is not much room for random error or random bugs happening.
It is well known that from bad amateur engines that every new 'algorithm'
has a 50% chance to get successfully tested to be working thanks to many
randomness and bugs that are inside.
If however at the world top someone draws conclusion X then it is very
likely that this applies to the others as well, as there is no randomness.
Therefore a sample taken is statistical way more significant than a sample
taken from a beginners engine.
> I can make my program play any move I want by tinkering with the
> evaluation function. But can I do this without affecting ANY of
> the other moves it played negatively?
I'm not sure what your definition of tinkering is. Sure you can add a lot
of knowledge to the evaluation function and get it to play better, assuming
no beginner bugs in the search.
>My own experience in computer chess tournaments is different. I have
>won my share of tournaments including one Dutch open where your
>program nearly beat mine. One thing I always notice is how often my
Software from those days is 500 points weaker, or 8 dan levels weaker than
todays software. It is no compare. Additionally my own program was pretty
weak at
the time.
>program NEARLY made a losing move but changes near the end of it's
>thinking cycle. This happens so often that I am always thankful I had
Your program was well known for having the poor evaluation function. I do
not doubt the scenario you described.
In those days as a titled player i could hands down win from software
programs, because they had *so many* holes. And nearly all of them
evaluation related.
Programmers were just learning how to use a huge cpu speed positively for
their program at the time.
>a little bit faster computer. I can't help but believe that with an
>even faster computer, I would have salvaged a few losses.
I do not believe it at all. You had like 128 cpu's versus the opponents 1.
>On the other hand, some of our bad moves would have been fixed with
>evaluation tuning or adjustments. We played a terrible move one year
>against KING and lost miserably in a slaughter. That one move could
The king had something in its evaluation function you didn't have so it won.
>have been fixed with a small evaluation change, OR a small amount of
>extra thinking time. Extra thinking time is a general cure,
>evaluation is a specific cure.
Your version from those days would end last now in a field
of amateur engines. And another ply won't cure that disease.
Note you searched the world champs if i remember well around 17 plies or
something.
My DIEP chessprogram of today does not get 17 ply.
So you'll outsearch it pathetically. Yet i want to bet that if we play 10
games
that it will be at least 10-0 for DIEP.
And another ply won't help. If you want to show up 256 processors, feel free!
2560, if you can get them to work, no problem!
Yet i'm sure you will conclude diep is tactical stronger than your program,
as it saw something you didn't see.
The reality will be however it saw it in evaluation function and not
tactical. You'll conclude you were outsearched 10 ply or so.
This where i already gave you that you'll outsearch me bigtime. a very
materialistic evaluation function as we know today will give a cutoff
everywhere. Searching very deep with it, is very easy.
The commercial programmers however didn't draw the wrong conclusion from
the world champs 1999. They worked hard at their knowledge and see the
result.
>I think chess programs are so strong now that games are won and lost
>almost by accident. If your program couldn't see the winning plan in
This is a wrong conclusion. Please show me game positions that proof this.
Supposing is 1 thing.
A lot better would be supposing based upon studied games.
>20 ply, chances are the other program didn't know it was winning
>either. YOU obviously realized the move was losing and if a 20 ply
>search didn't reveal this I think it proves the point I have always
>tried to make which is that computers still suck at chess.
Knowing i lose every blitz game against them that would be a tough conclusion
to draw.
Only strategical spoken i could perhaps claim i'm better, but if i may
remind you, that's
really the last thing i can fall back onto. All the other things i used to
be better in i already swallowed silently in and won't mention them as me
being better there!
>In another computer game we played, our program had to choose between
>2 completely different plans. The plans were radically different and
>led to radically different kinds of games. The program kept
>alternating between the 2 moves. It settled on the plan that let the
A few years ago your statement here made sense.
Nowadays chess software is very good in setting up for example mating
attacks. Which in chess *is* the final goal. If they can setup a mating
attack against you, non-super GM players are doomed. The silicon will find
the plan and terminate you, as simple as that. Some programs are better
than others in this of course. That's true.
>opponent have a big juicy center, but giving us 2 connected and passed
>pawns but which were not very well advanced. The only difference
>between these plans was a point or two in the evaluation function.
>Neither me nor the computer had any way of really knowing whether the
>plan was good because there was no immediate danger and the
Buy some commercial program from 2003 and it will tell you which of the 2
choices is best.
Let's not praise 90s for representing the truth in 2003 as well.
>consequences were far off into the future. This is so common in
>computer chess that I laugh when people say all you need in chess is a
>simple evaluation that counts heads and deep search. They just don't
No commercial programmer laughs for that. The vaste majority of them
concluded it already in the Aegon 1997 tournament. There were many chats
there between many programmers about only this subject. The real eye opener
for many was a french programmer who claimed that piece square tables
didn't work for him anymore and didn't give the program better moves when
searching deeper and he concluded some drastic things based upon that,
which are the reality of today.
>have a clue. As it turns out, we won that game probably because we
>signficantly outsearched our opponent. Not because we made the right
>decision at that point. In fact, I suspect we made the wrong decision
>because our pieces were really tied down after losing the center.
>Don
The last time i felt outsearched was in 1997 world champs.
I suggest you doing an experiment.
Play the world champion shredder8 at hardware of 1998 at 3 minutes a move
against brutus/nimzo from 1998.
Of course it is unfair to use different books. Just do 2 experiments.
That's using the 1998 book from those days for both engines and the 2003
book from shredder8 of today.
So that both have a match with the same books.
I predict nimzo98 will get slaughtered completely.
If that's the case that means that the lemma's of those days that hardware
was not fast enough and that outsearching worked, also was simply untrue,
as at hardware of those days a dual K7 optimized version of today is
butchering Nimzo98 which was a program optimized for that hardware of today.
I'm mentionning nimzo98 deliberately here, because it won the dutch
championship in 1998.
_______________________________________________
>computer-go mailing list
>computer-go@xxxxxxxxxxxxxxxxx
>http://computer-go.org/mailman/listinfo/computer-go
>
>
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://computer-go.org/mailman/listinfo/computer-go