[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Cluster Supercomputing
At 09:35 AM 12/8/98, Darren Cook wrote:
>>Does anyone care to make some comment about today's (and in next 10 years)
>>state of the art super computers? How do they compare with a Pentium 450
>>MHz?
>
>I've been working on using a cluster system for go. This means a number of
>normal PC's, connected over a LAN, and each one does part of the
>calculation. This is much cheaper than parallel machines. The reason
>parallel machines are expensive and complex is because they are able to
>share memory. This is also where programming gets complex.
>
>But for most tasks you don't need this. My plan is to hand out one
>top-level move to each machine that is on the LAN. Also giving it a maxply
>and a max branching factor to search to. Each will return the score
>estimate for that move, and the program then plays the best score. While
>there is still time left for the current move, it will hand out each of the
>top-level moves it wants to look at, and once they have all been done, it
>will hand them out again but with higher maxply/maxbranch. [1]
>
>Each search is 0.1 seconds to 2 seconds [2], so this is course-grain
>parallelism. There is no memory sharing between machines - they each keep
>their own caches. The only communication is from the central control
>machine to send search requests, and to send the actual move played.
>
>Because communication is so small compared to the size of each calculation,
>I can be lazy. I can use cheap ethernet cards to connect machines, and I'm
>currently using Python to handle all the client/server stuff. Python is
>good because the sockets are built-in and easy to use, and because it is
>portable between linux and NT, so I can use a mix of machines on the LAN.
>
>The plan has been to prototype with python, then replace with C++ once the
>design becomes stable. However a number of cluster supercomputers are using
>python, including the one that recently set a record [4]. So maybe I'll
>just leave it that way.
>
>I can have 8 x PII-350Mhz machines for less than a million yen (8300
>dollars). PII-2800Mhz. :-). [3]
>You might ask why bother? An 8x speedup is not that significant for go.
>Partly I'm after the experience as this will be useful in other projects.
>But also, having that speed allows me to try out ideas that previously got
>dismissed as too slow. I can also develop quicker as I don't have to spend
>60% of my time working on speed optimizing before I dare run an algorithm.
An easy improvement is to use some algorithms like the dynamic treesplitting
from Hyatt, combined with an easy principal variation search which only
divides the moves under other processors when you already have a score
returned
from the first move. NYBW algorithm...
I support however your idea of easy parallellism using IPC.
I think IPC way more flexible and easier than using threads. You need
of course some more memory.
The main difference is that i'm using shared memory
for my hashtables whenever possible. The advantage of DTS is that you
can chose where to split search, so nearer too root is preferred, which
you MUST do when communication is slow, where most algorithms (assuming
use of shared memory) split a lot near leafs too.
Note that 8 PII-350s are still using a lot of different bigtowers and
fans and AGP cards and memory, where at 4 PII-300s dual running 450 Mhz
you get way more speed and profit, and it is even cheaper!
>Darren
>[1]: This iterative deepening/widening algorithm was the one I used at the
>recent FOST cup, on a single machine, and I was quite pleased with it.
>
>[2]: From Fost'98 version timings on a P-200. This could change an order of
>magnitude in either direction as I work on the code :-).
>
>[3]: Actually I'm thinking of grabbing old Pentium's (166 and 200's) as
>people upgrade.
>
>[4]: http://cnls.lanl.gov/avalon/avalon_bell98/
> (Sustained 10Gflops for $150,000)
>
>
>
>