[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] question regarding Hydra Chess PC computer



I'm sure Chrilly already clearly wrote that down in this newsgroup,
but perhaps let me show you some proof.

Overclocking that fpga chip a little with big risks is not so interesting.
Secondly you sure they have 'experts' in Abu Dhabi who know how to
overclock a chip?

The real interesting thing is to overclock the PCI bus: myrinet card and
communication to FPGA, when that would be possible.

Chrilly's hardware search can get a million or 3 nodes a second in hardware
when it would be 100% utilized (AFAIK chrilly doesn't measure how effective
it gets used).

That means that 1 node costs 330 nanoseconds.

Transferring a message from fpga to cpu costs 4 us.

Transferring a message from this cpu to another remote cpu costs another 5
us (assuming chrilly has fast form of myrinet, otherwise slower).

I hope you see the problem.

To lookup just 1 hashtable entry costs 10000+ nanoseconds

A single communication or remote hashtable lookup is slower than searching
hundreds of nodes.

So that has a number of consequences:
  b) it is very hard to use a global hashtable at clusters
  c) it is very hard to quickly start and stop other processors searching

The result is:
  a) it is very hard to make good working parallel algorithms for clusters
  b) your speedup is real terrible at many cpu's

Deep Blue estimated they were having a 15% speedup. However they
extrapolated the speedup. Later they adjusted it to 5% speedup effectively.
That again was done by guessing. Not measuring.

Diep had a speedup of 20% at the supercomputer. However i must add a few
notes to that.
  a) that only was after a few minutes for a search, quickly played moves
it definitely didn't have that speedup.
  b) 1 cpu was dead slow. Just 500Mhz. Diep searched with 20k nps a cpu at
     most single cpu at those outdated chips. 
  c) The latency at the Origin3800 at which diep ran is far better than
what chrilly runs at. Note i'm speaking about two-way ping pong times. At
the origin3800 that was 5.8 us for messages of 8 bytes (so that's more than
a pingpong). That is average speed. Chrilly has to deal with 10-20 us and
FPGA chips which deliver effectively 50-100k software nodes a second when
running at 32Mhz.
  d) Diep's parallel algorithm is using YBW. I am not sure chrilly is using
that. YBW just gives a better speedup (and years of extra work to get it to
work). 
  e) At 460 processors the YBW i was using was clearly not able to play
blitz time controls.
  f) Diep was not forward pruning (other than nullmove) nor was using SE
(singular extensions) when running on the supercomputer, that gives more
balanced trees than what Chrilly faces possibly. Balanced trees are easier
to split than unbalanced trees. 

At 14:36 26-10-2004 +0200, Frank de Groot wrote:
>> Generally hardware runs faster when cooled down.
>
>> But we have a practical system in mind which plays 24h a day chess.
>
>
>Well if you're working on the worlds' strongest Go machine that will beat a 
>pro player, you might consider putting some Peltier-elements on the FPGA's 
>and use water cooling to remove the heat. Peltier elements are solid-state 
>and go to minus 20 or even -40 C. The investment for that would be 
>insignificant compared to the rest of the hardware, software and time, and 
>the payoff could be a system that runs 50% or 70% faster, *especially* under 
>the conditions you sketched. 
>
>_______________________________________________
>computer-go mailing list
>computer-go@xxxxxxxxxxxxxxxxx
>http://www.computer-go.org/mailman/listinfo/computer-go/
>
>
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/