[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Supercomputers and search



At 22:17 5-11-2004 -0500, Don Dailey wrote:

First of all let me state here one thing for the go mailing list readers,
that Don and i can get along very fine. The thing where we both might
disagree is the efficiency of cilk for TODAYS supercomputers and the
efficiency of it at the supercomputers it ran at the last few tournaments
it searched. 

Let's first agree upon 1 thing.

  a) it's easy to implement cilk

So easy in fact that when in the chair here on the left of me, 1 meter away
from me, when Don was sitting there some years ago, he could explain it to
me within a few minutes.

Point i want to make:
 
  b) todays cilk is just TOO EFFICIENT to even *consider* using it

A note from the past

  c) the problems of the parallel programs from the past
      c1) no logfiles were posted
      c2) if there were results posted, usually 
          no objective comparisions of an optimized single cpu versus
          the parallel version has been made. If you can get 200k nps
          single cpu (Don's PC program) at your 32 bits PII 
          266Mhz laptop, and a SINGLE CPU cilkchess bitboards at 64 
          bits processor Sun only gets 5000 nodes a second, knowing that
          crafty also hands down got 150k nps on that hardware, then obviously
          we lost a factor 40 somewhere. You can argue it was a factor 20 or
          15 or 35. That's not my point.

          Rudolf Huber's point is that the only possible correct
comparision is an optimized single cpu version versus the parallel version
at all cpu's.

So in your case in world champs 1999 you ran on a 512 processor Sun or
Origin. I do not exactly remember the speed of the processor, but the same
applies to Zugzwang which joined the world champs 1999. Zugzwang ran on a
T3E with 512 processors and alpha's at if i remember well around 500Mhz. I
guess you were around that speed too.

Single cpu 500Mhz you would get about 400k nps with a simple bitboards
program at a 64 bits chip. In reality you searched a few million nodes a
second.

So effectively cilkchess had a speedup of 5 out of 512. Arguably 10 out of
512 (depending upon processor speed and the exact nps). 

Whatever you want there 5, 6, 7 or even 10.

That's not a *loss* thanks to cilk of only 1-2% but a total SCALING of 1-2%
and if you then get a speedup of 50% out of that, then in comparision to an
optimized single cpu your speedup is 0.5% or 1%. Not 50%.

And that is the REALITY, because you play optimized pc programs which
didn't get slowed down a factor 40 first in order to get a better speedup
when compared to the slowed down version.

Now let me answer in next posting all other points with respect of todays
software. 

>Yes, I know computer chess has advanced.

>I gave up lazy eval before I
>left computer chess because it  wasn't safe unless I used big margins,
>at which point it didn't help  much at all.  I predicted before anyone
>that  the quality  of  evaluation  functions would  be  the future  of
>computer chess  more than  speed.  This was  when others  were touting
>speed.  I  proved this to  myself via simulations,  better evaluations
>improved more  rapidly with deeper searches.  I  think Berliner pretty
>much demonstrated this too.
>
>But why MTD no longer used?  Is  it because MTD is so hard on the hash
>tables  and  memory latency  is  a bigger  issue?   MTD  is much  more
>dependant on  the fact that the  search tree remains  mostly stored in
>memory via the hash tables so I can believe this is possible.   
>
>
>> Nullmove nowadays gets used at R=3, usually not near the leafs.
>
>Interesting.  My  program used null move,  but near the  leaf nodes. I
>used  a  static  threat  detector  instead.  This  was  a  very  clear
>improvement over pure null move and  I considered it one of my tricks.
>I  don't  know  if others  knew  about  this  or  figured it  out  for
>themselves, but I think it was one of the little goodies in my program
>over many others and I didn't share this secret with others until much
>later.   The static  threat routine  picked up  some things  null move
>could not.  Interestingly,  it also missed some things  that null move
>would get, but the overall effect was a clearly better program.
>
>When you say, "usually not near the leafs", what do you mean?  Are you
>using R=2 near leaf or something else entirely or nothing at all?
>
>
>> I just saw it was 40 times slower than it could be.
>
>In 1996 the program was running on an outdated 32 processor SGI, I say
>"outdated" because  Pentiums were much  faster (and so were  newer SGI
>machines.)   So it may  have been  like running  on 16  Pentiums which
>still gave us a big advantage.  But if the parallel implementation was
>even 20X slower, we would have been running slower than everyone else,
>yet we won the tournament with an impressive score.  I don't know what
>you believe you saw, but our program easily outsearched the others and
>I don't believe  we won due to better evaluation, do  you?  Or are you
>giving me a compliment?
>
>In  1997  we only  had  a  4 processor  machine,  and  almost won  the
>tournament.   I don't  believe this  program  was very  good, but  the
>parallel implementation in both cases (1996 and 1997) was excellent.
>
>But there were a number of  big problems we had during this tournament
>which I think your comments are based on:
>
>   1. Extremely poor internet connectivity.
>     
>      I don't remember  Aske being given extra time  to compensate for
>      the  poor  connectivity,  but  I  can  believe  they  did.   The
>      connectivity was  so poor  that we probably  could not  have won
>      otherwise,  expecially  since the  time  control was fast.
>
>      Nevertheless, if you are saying this was unfair to the others, I
>      agree completely.  I believe  remote machines should play by the
>      same rules and they should have to deal with the extra problems,
>      not  the rest  of the  participants.  The  internet  lag doesn't
>      affect the playing strength of  the program, only it's chance of
>      losing on time.   So if we were given a few  extra minutes I see
>      this  as clearly wrong.   But the  tournament director  was very
>      inexperienced,  as  you  will  see  from point  3  below.
>
>   2. Bugs.  The  user interface was incredibly buggy  and really hurt
>      our program.  I  think the SGI machine crashed  once or twice to
>      make matters  worse.  We ran  with assertions turned on,  and we
>      tripped one  of the assertion  failures which of  course crashed
>      the program.
>
>   3. In one  of the games, Cilkchess  chose one move,  but Aske Plaat
>      mistakenly  played a  different  move on  the  board.  In  every
>      computer  chess tournament  I have  ever entered,  the operating
>      principle was  to never penalize  a program for  operator error.
>      For some  reason, the tournament director ruled  against us, and
>      we had to force Cilkchess to  play a move it never chose.  I was
>      extremely upset  over this  as I believed  it would cost  us the
>      tournament and the  move of course was a  horrible error.  As it
>      turned out,  we somehow won  this game despite  the non-standard
>      ruling and Aske's mistake.    
>
>      The tournament  director was inexperienced as he  made 2 errors,
>      he gave us extra time  unfairly, but he made this grossly unfair
>      ruling.   If I  had  a choice  of  giving my  opponent 15  extra
>      minutes, or being forced to make 1 random move in 1 of the games
>      of an 11  round tournament, I am not sure  which I would choose,
>      as the 1 random move should almost certainly be lethal.
>
>In  my  later  experiences  at   the  Dutch  open,  I  found  all  the
>participants  and the  directors to  be extremely  nice,  friendly and
>helpful.  In fact, one of the  participants let me and one of the UROP
>students who traveled with me stay  in his home for a few days.  Let's
>see, who  was that guy?  I  think his name was  Vincent Diepeveen.  Do
>you know him?   Please tell your family that  I remember this kindness
>and still think of it from time to time.
>      
>
>
>- Don
>
>
>
>
>
>   X-Original-To: computer-go@xxxxxxxxxxxxxxxxx
>   X-Sender: diep@xxxxxxxxxxxxxxxxx
>   Date: Sat, 06 Nov 2004 01:35:28 +0100
>   From: Vincent Diepeveen <diep@xxxxxxxxxxxxxxxxx>
>   X-Virus-Scanned: by XS4ALL Virus Scanner
>   Reply-To: computer-go <computer-go@xxxxxxxxxxxxxxxxx>
>   Sender: computer-go-bounces@xxxxxxxxxxxxxxxxx
>   X-Scanned-By: MIMEDefang 2.42
>
>   Don, a few things have changed since you left computerchess.
>
>   MTD no longer can get used. Lazy eval gets used by less and less
engines in
>   the top. If i read the document well Chrilly also can't use it and chrilly
>   gives a clear explanation for it. The same big scores from evaluation
>   function will let the evaluation fluctuate from move to move.
>
>   Another thing that happened at supercomputers is that in contradiction to
>   the 80s, the latency to get a remote byte from a cpu relative to the cpu
>   speed has become 100 times bigger.
>
>   So that already ends all plans to use cilk at supercomputers.
>
>   Additional in todays software, a shared hashtable is very important to
have. 
>
>   Todays software, because of the improved evaluation function suffers from
>   more effects.
>
>   Deepfritz6 at a quad xeon 500Mhz as it was in 1999 running, searched 17
ply
>   every move. 
>
>   Todays deepfritz8 in world champs 2003 and 2004 searched 14 ply at a quad
>   opteron 2.2Ghz.
>
>   Chrilly basically already answerred why.
>
>   Nullmove nowadays gets used at R=3, usually not near the leafs.
>
>   No one uses cilk in game tree search of course.
>
>   At 19:13 5-11-2004 -0500, Don Dailey wrote:
>   >
>   >A little more about Cilk, since Vincent grossly mispresented it.
>   >
>   >Cilk  is really C  with a  few simple  constructs added;  spawn, sync,
>   >inlet and abort.   An inlet can catch the result  of a computation and
>   >can abort work  inside it.  This mechanism makes  it perfect for alpha
>   >beta searching.
>   >
>   >Any function  that begins  with the keyword  "cilk" can be  spawned in
>   >parallel.  The scheduler is based on work stealing.  In cilk you don't
>   >think about  the number of  processors, this is abstracted  away.  Any
>   >time  a "sync"  keyword is  invoked,  the processor  doesn't idle,  it
>   >randomly grabs work from somewhere else.
>   >
>   >The  beauty of  cilk  is that  it  rarely invokes  overhead for  these
>   >parallel constructs.  The cilk  pre-processor constructs 2 versions of
>   >each  "cilk" function,  one that  is purely  serial and  knows nothing
>   >about the cilk keywords and  associated overheads and the other "slow"
>   >version has all  the good stuff for managing  the scheduler.  The vast
>   >majority of the time your program  is invoking pure C code without any
>   >cilk overhead.
>   >
>   >Here is an  example of the fibonacci function taken  from the cilk web
>   >page:
>   >
>   >
>   >cilk int fib (int n)
>   >{
>   >    if (n < 2) return n;
>   >    else
>   >    {
>   >        int x, y;
>   >
>   >        x = spawn fib (n-1);
>   >        y = spawn fib (n-2);
>   >
>   >        sync;
>   >
>   >        return (x+y);
>   >    }
>   >}
>   >
>   >
>   >
>   >
>   >For alpha/beta applications like chess, there is an abort keyword that
>   >can be  invoked to stop  work that is  happening, such as is  the case
>   >when you get a beta cutoff.   An "inlet" can catch this case and abort
>   >all work spawned  inside the parent stack frame.   Basically, you call
>   >the inlet (which is a function  inside a cilk function GCC style) with
>   >a spawn  statement as an argument  to the inlet  function.  Inside the
>   >inlet, you can abort work.
>   >
>   >This sounds a  lot more complicated than it actually  is.  It is quite
>   >trivial to write  an alpha/beta search routine using  cilk without the
>   >normal pain  and agony and  debugging using a threading  library (like
>   >pthreads) for instance.
>   >
>   >The scheduler is  efficient and your code is  effecient, you are still
>   >programming in  C.  You tell cilk  how many processes you  want to use
>   >and cilk will use than many (even  if you are on a serial machine, you
>   >can write, debug  and test parallel programs specifying  any number of
>   >proceses.)    
>   >
>   >If you write  a cilk program and  run it as a 1  processor program, it
>   >will  run nearly as  fast as  the same  program written  without cilk,
>   >usually within 1 or 2 percent slowdown.
>   >
>   >It's really cool for many parallel programming tasks.
>   >
>   >Check it out at  http://supertech.lcs.mit.edu/cilk/home/intro.html
>   >
>   >- Don
>   >
>   >_______________________________________________
>   >computer-go mailing list
>   >computer-go@xxxxxxxxxxxxxxxxx
>   >http://www.computer-go.org/mailman/listinfo/computer-go/
>   >
>   >
>   _______________________________________________
>   computer-go mailing list
>   computer-go@xxxxxxxxxxxxxxxxx
>   http://www.computer-go.org/mailman/listinfo/computer-go/
>
>_______________________________________________
>computer-go mailing list
>computer-go@xxxxxxxxxxxxxxxxx
>http://www.computer-go.org/mailman/listinfo/computer-go/
>
>
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/