[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Computer Olympiad photo



At 17:31 4-12-2003 -0500, Don Dailey wrote:
>
>> From the last 3 tournaments diep played, the DECISIVE errors not long after
>> diep had left book, they took place at depths of around 10-13 ply, when i
>> searched them at 24 hours a move, so getting depths of up to 20 ply, the
>> vaste majority of moves still were played. Only 1 of the moves another 2
>> ply was helping (i simply got too small of a search depth with my slow
>> knowledge oriented chessprogram which is not so selective like most top
>> programs are).
>> 
>> So where overwhelming brute force power would fix 1 move,
>> all of the errors i could easily fix by some improvements 
>> in evaluation function.
>
>
>This is a very misleading use of statistics.  Let me point out the problem:

It isn't. It is showing reality.

>  1. Your program  already has  a reasonably well  balanced evaluation
>     function compared to most chess  programs.  That means it is hard
>     to  make improvements.  Even  if you  hand coded  some evaluation
>     fixes  to make  the program  play all  the right  moves  where it
>     failed,  the actual  improvement  to the  program  would be  very
>     negligible.

It is true progress goes slow, but all those 'neglibible improvements' add
up and add up...

>     Usually, changes to fix  problems create other problems.  That is

This is true when you modify just a parameter. There has been a shift
however in the world
top chess programs which i already had predicted years ago.

That is you can safely add code for certain cases without influencing other
parts of the game much.

So you *can* simply code new rating points without risk. This has been
proven by many different authors now.

>     because chess  evaluation is hard  and we code up  heurstics that
>     are based on  general principles, not hard facts.   Once you have

That's why it helps that i'm a titled chess player, i can distinguish
between general principles and hard facts which are nearly always true.

It is however true that it is hard to make code that avoids general
principles which in some positions are completely false.

In this sense there is little difference between go and chess.

>     achieved a  reasonable balance you  reach a point  of diminishing
>     returns  where   it's  simply  very   hard  to  make   even  tiny
>     improvements.   Each heuristic  interacts with  other heuristics,
>     and it's a big mess.

This is true when you add heuristics. If you add however knowledge it is
not true. Adding knowledge means coding new rating points!

>     If you add new ideas to the evaluation function, you slow it down
>     a little.   It becomes very  strong in dealing with  the problems

If you are slower than the rest, just show up with more processors. Which
is what i did. DIEP showed up last world champs with 512 processors...

>     you  just fixed, but  you have  effectively weakened  the program
>     just a  little in  every other position  because you  have slowed
>     down the search.   I once made a number  of very clear positional

As i pointed out, i prefer a year of bugfixing slowing down my engine 2
times versus 1 ply extra. The extra ply i'll get anyway in the end because
hardware gets faster and there is not much difference between 14 and 19 ply
search depth, whereas a better bugfixed and enhanced evaluation function
brings way more.

In fact there is hard proof for this by empirical data.

In 1999 many chessprograms reached 14-17 ply, and the years after that half
the commercial world top searched 2 ply less, because they had added
massively knowledge to their chess program.

Of course not many persons noticed this. The average scientist in this
field is doing a lot but not taking a look what the commercial guys are
doing and measuring search depth differences etc!

>     improvements  to my  chess  program and  discovered  that it  was
>     playing weaker.  After a lot of study, I learned that the program
>     was  now 1/2 the  speed of  the previous  version and  was losing
>     tactically.   In reality  all  the improvements  affected only  a
>     fraction of all the positions it was required to search.

Your program doesn't have any form of king safety, so it's trivial that you
seemingly get tricked tactical there, where it usually is positional.

>  2. It  doesn't take  a  lot of  different  moves to  make a  program
>     stronger.   You mentioned  3  tournaments, but  Diep didn't  lose
>     every game, it is well above average strength of tournament chess
>     programs.   So  it probably  won  most  of  it's games.   We  are

I only lost from commercial programs of course. That means that it is
indeed 2 to 3 moves each tournament we are talking about. So i'm only
interested where i lost my points if you don't mind. From scientific
viewpoint this is also interesting, because how do you advance an already
strong product?

I fail to see any other computer chess researcher who is publicly writing
in newsgroups about this. 

Try to find much relevant data here from the number 1 to 6 of the world
last 2 world championships. You won't find much useful. Yet i know they all
drew the same conclusion like i did. I am however th eonly one who speaks
out loud here.

So that is the sad truth indeed.

Yet if you just look to what all those guys did the past 3 years you'll see
that despite cpu's getting x times faster their engines do not get more
nodes a second.

This can have 2 reasons:
  a) more forward pruning
  b) more evaluation

In most cases we can soon conclude that it is only b that influences it and
in the other cases we can after some study of the different versiosn
conclude it is b.

DIEP didn't slow down last 3 years a factor 2.0, however several commercial
engines did.

Trivially that's nearly 1 ply, so what do they think more important?

>     interested only in the few games  it lost (or draw when a win was
>     called  for.)  But  an extra  win out  of a  very few  games will
>     register as a substantial  rating improvement.  At any rate, this
>     is a pathetically small sample to base conclusions on.

No, the level is that high and there is an overwhelming number of strong
engines that show the same habit, that it is very clear.

Now just do some effort and study those engines. Because this is the real
weakness.

You are busy concluding within your WWW browser and your own thoughts and
experiments from the past.

I'm daily confronted with all those engines and each year i play 5 to 7
tournaments where i join my engine in and get confronted with all those
engines and different programmers. I chat to them, eat with them and email
continuesly with them. At home i study their engines closely.

Do you?

If you do not, how can you deny any empirical data here?

If you do not take sample data, you of course never will see any evidence
here and deny things based upon statistics.

Yet if my engine in the past few tournaments (world champs 2003 not being
extensively researched yet) is making only evaluation mistakes and not
search mistakes, then the only undeniable data that you are confronted with
is just that.

Denying that as useful data is incorrect. It's high level games where there
is not much room for random error or random bugs happening.

It is well known that from bad amateur engines that every new 'algorithm'
has a 50% chance to get successfully tested to be working thanks to many
randomness and bugs that are inside.

If however at the world top someone draws conclusion X then it is very
likely that this applies to the others as well, as there is no randomness. 

Therefore a sample taken is statistical way more significant than a sample
taken from a beginners engine.

>     I can make my program play  any move I want by tinkering with the
>     evaluation function.  But can I  do this without affecting ANY of
>     the other moves it played negatively?

I'm not sure what your definition of tinkering is. Sure you can add a lot
of knowledge to the evaluation function and get it to play better, assuming
no beginner bugs in the search.

>My own experience in computer  chess tournaments is different.  I have
>won  my share  of  tournaments  including one  Dutch  open where  your
>program nearly beat  mine.  One thing I always notice  is how often my

Software from those days is 500 points weaker, or 8 dan levels weaker than 
todays software. It is no compare. Additionally my own program was pretty
weak at 
the time.

>program NEARLY  made a losing  move but changes  near the end  of it's
>thinking cycle.  This happens so often that I am always thankful I had

Your program was well known for having the poor evaluation function. I do
not doubt the scenario you described.

In those days as a titled player i could hands down win from software
programs, because they had *so many* holes. And nearly all of them
evaluation related.

Programmers were just learning how to use a huge cpu speed positively for
their program at the time.

>a little bit  faster computer.  I can't help but  believe that with an
>even faster computer, I would have salvaged a few losses.

I do not believe it at all. You had like 128 cpu's versus the opponents 1.

>On the  other hand, some of our  bad moves would have  been fixed with
>evaluation tuning or adjustments.  We  played a terrible move one year
>against KING and  lost miserably in a slaughter.   That one move could

The king had something in its evaluation function you didn't have so it won.

>have been fixed  with a small evaluation change, OR  a small amount of
>extra  thinking  time.   Extra   thinking  time  is  a  general  cure,
>evaluation is a specific cure.

Your version from those days would end last now in a field 
of amateur engines. And another ply won't cure that disease.

Note you searched the world champs if i remember well around 17 plies or
something.

My DIEP chessprogram of today does not get 17 ply.

So you'll outsearch it pathetically. Yet i want to bet that if we play 10
games
that it will be at least 10-0 for DIEP. 

And another ply won't help. If you want to show up 256 processors, feel free!

2560, if you can get them to work, no problem!

Yet i'm sure you will conclude diep is tactical stronger than your program,
as it saw something you didn't see.

The reality will be however it saw it in evaluation function and not
tactical. You'll conclude you were outsearched 10 ply or so.

This where i already gave you that you'll outsearch me bigtime. a very
materialistic evaluation function as we know today will give a cutoff
everywhere. Searching very deep with it, is very easy. 

The commercial programmers however didn't draw the wrong conclusion from
the world champs 1999. They worked hard at their knowledge and see the
result. 

>I think chess  programs are so strong now that games  are won and lost
>almost by accident.  If your  program couldn't see the winning plan in

This is a wrong conclusion. Please show me game positions that proof this.

Supposing is 1 thing. 

A lot better would be supposing based upon studied games.

>20  ply, chances  are the  other program  didn't know  it  was winning
>either.  YOU  obviously realized the move  was losing and if  a 20 ply
>search didn't  reveal this I think  it proves the point  I have always
>tried to make which is that computers still suck at chess.

Knowing i lose every blitz game against them that would be a tough conclusion 
to draw. 

Only strategical spoken i could perhaps claim i'm better, but if i may
remind you, that's 
really the last thing i can fall back onto. All the other things i used to
be better in i already swallowed silently in and won't mention them as me
being better there!

>In another computer game we  played, our program had to choose between
>2 completely different plans.   The plans were radically different and
>led  to  radically  different   kinds  of  games.   The  program  kept
>alternating between the 2 moves.  It  settled on the plan that let the

A few years ago your statement here made sense.

Nowadays chess software is very good in setting up for example mating
attacks. Which in chess *is* the final goal. If they can setup a mating
attack against you, non-super GM players are doomed. The silicon will find
the plan and terminate you, as simple as that. Some programs are better
than others in this of course. That's true.

>opponent have a big juicy center, but giving us 2 connected and passed
>pawns  but which  were not  very well  advanced.  The  only difference
>between these  plans was  a point or  two in the  evaluation function.
>Neither me nor the computer had  any way of really knowing whether the
>plan  was  good  because  there   was  no  immediate  danger  and  the

Buy some commercial program from 2003 and it will tell you which of the 2
choices is best.

Let's not praise 90s for representing the truth in 2003 as well.

>consequences  were far  off into  the future.   This is  so  common in
>computer chess that I laugh when people say all you need in chess is a
>simple evaluation that counts heads  and deep search.  They just don't

No commercial programmer laughs for that. The vaste majority of them
concluded it already in the Aegon 1997 tournament. There were many chats
there between many programmers about only this subject. The real eye opener
for many was a french programmer who claimed that piece square tables
didn't work for him anymore and didn't give the program better moves when
searching deeper and he concluded some drastic things based upon that,
which are the reality of today.

>have a  clue.  As it turns out,  we won that game  probably because we
>signficantly outsearched our opponent.   Not because we made the right
>decision at that point.  In fact, I suspect we made the wrong decision
>because our pieces were really tied down after losing the center.
>Don

The last time i felt outsearched was in 1997 world champs.

I suggest you doing an experiment.

Play the world champion shredder8 at hardware of 1998 at 3 minutes a move
against brutus/nimzo from 1998. 

Of course it is unfair to use different books. Just do 2 experiments.
That's using the 1998 book from those days for both engines and the 2003
book from shredder8 of today. 

So that both have a match with the same books.

I predict nimzo98 will get slaughtered completely.

If that's the case that means that the lemma's of those days that hardware
was not fast enough and that outsearching worked, also was simply untrue,
as at hardware of those days a dual K7 optimized version of today is
butchering Nimzo98 which was a program optimized for that hardware of today.

I'm mentionning nimzo98 deliberately here, because it won the dutch
championship in 1998.

_______________________________________________
>computer-go mailing list
>computer-go@xxxxxxxxxxxxxxxxx
>http://computer-go.org/mailman/listinfo/computer-go
>
>
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://computer-go.org/mailman/listinfo/computer-go