[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Computer speed

To: John Bate <bate@xxxxxxxxxxxxxxx>, computer-go@xxxxxx
Subject: Re: Computer speed
From: Vincent Diepeveen <diep@xxxxxxxxx>
Date: Mon, 07 Dec 1998 21:06:03 +0000
At 10:15 AM 12/7/98 -0600, John Bate wrote:
>>Does anyone care to make some comment about today's (and in next 10 years)
>>state of the art super computers? How do they compare with a Pentium 450
MHz?
>>For computer Go, I guess the integer operation is more important. Thanks.
>
>Serial-instruction-stream processors (like Pentium, PPC, etc.) won't get
>significantly better than they are now. The curve is flattening out. (There
>might be room for, say, another factor of 10 improvement, but that's not
>significant in comparison to what has happened over the last 20 years.)

Considering speed improvement within the next few years, Katmai next year
might be 3x faster for integer programs relying on level2 cache (so
game-playing programs especially the GO-programs and the few knowledgeable
chessprograms), this compared to my current PII-450 dual.

New processors using smaller technology will considerably get quicker, which
considering the speed of a single super-computer processor, is quite well
possible. The new alpha 21264 is definitely gonna kick some butts,
where the by intel announced merced in 2000 will give horrible speed
improvements for software (which must be recompiled to run at it though).

So for the next few years we already expect another factor of 10, which
kicks butt. A processor which only a few years ago was made using 0.6
distance technology (pentium 60), and now gets near 0.18 within few
years (right now around 0.25), then you can already calculate that
you can place way more transistors within the same area.

To compare some 'old' super computer technology: a single Cray processor
(from some years ago) got 800k nodes a second (chessprogram cray blitz),
where the equivalent optimized for the 'microcomputer' is getting
at the PII around 125k.

A quad Xeon nowadays does better, getting over 1M nodes, which is still
less than 800k at 1 cray processor (loss due to parallellism is huge),
where this quad Xeon is still quite unaffordable.

>From now on, parallelism of various kinds is where the speed improvements
>will come from. To some extent, that has been done already, of course, with
>superscalar pipelines, MMX, AltiVec, etc. but look for much more
>parallelism in the not too distant future. Today's "supercomputers" already
>use parallelism greatly, but it will move down and become widespread. (The

I completely agree, instead of grouping a lot of processors together,
i'd rather think for the future about a single processor which is already
carrying a lot of processors in it, so a single computer will behave
like a parallel machine too.

I recently bought my own dual machine, and my chessprogram is now nearly
the first commercial program to be parallel (where the first and only
parallel chessprogram playing at a PC is crafty, which uses a far
from optimal form of parallellism), using the currently best known
form of parallellism (DTS, described by Bob Hyatt- -see ICCA volumes.

>From a commercial viewpoint (not scientific) i have to warn however.
Don't buy a dual machine too quickly. Although windows-NT and Linux
support parallellism, there is no easy way to incorporate parallellism
into your program.

So it might take many tens of years before parallellism in personal computers
will really be a big issue. 

Right now dual computers are rather cheap, i payed 900 dutch guilder for
my dual asus bx motherboard supporting processors up to 800Mhz (133x6),
which is around 900/1.88 = 478 dollar, which compared to good PII
motherboards,
which are around 200 dutch guilder, about 110 dollar, is still a big
price difference. Further you need to buy another processor, which is
also quite expensive.

However the really interesting parallel systems (4 processors and more),
are rather out of reach. Only a mainboard for those bastards can be like
10000 dollar, not to mention the many thousands of dollars for a single
Xeon-CPU, where you need a whole bunch of them. 

So where a dual PII-450 system (300 processors in it running 450Mhz) 128MB
SDRAM cost me all together 1500 dollar (yes i bought every part very cheap),
and still what i call 'affordable',

a system which is like 2.5 times faster (quad xeon) is more than
10 times more expensive, not to mention the faster cray processor
which gives another x times speed up, but is more than 100x times
more expensive.

Long story about something simple: parallellism, interesting for
scientists to think about, but practically: forget it next x years.

The only language i know of which is in development and might get a serious
windows port is CILK (developed at MIT). Cilkchess recently joined
the dutch open championship computerchess 
at a 64 processor (195Mhz each) machine, and cilk language is really 
looking easy. 

It is however far from practical to say that parallellism is the way to
go. A better statement would be: first wait and see, perhaps before i
retire (i'm 25 years old now) big parallel systems will get cheaper.

>term "supercomputer" is already starting to disappear - as did
>"microcomputers" which is what desktop computers used to be called. Anybody
>hear your Mac or PC called a "microcomputer" recently?) Networks of smaller
>computers are now being used for many big jobs - Pixar used a room full of
>desktop computers, not a "supercomputer", to do the rendering for "A Bug's
>Life", and networks of iMacs are being used to make a relatively
>inexpensive "supercomputer" on some University campuses.

Right, but considering the huge power (i'd bet that the power bill
is more expensive in the end than the system itselve) needed and 
the huge space needed by a 10,000 CPU pentium pro system, as gets used for
military/nuclear calculations in Los Alamos, it is far practical to claim 
that their huge advantage in memory speed and i/o speed and number of 
processors is gonna dissappear soon and will get available to us all...

Also note that a lot of supercomputer systems aren't that fast at all,
but sometimes only have one single advantage which is needed.

For example: meteorologists don't need a fast computer, they need a
horrible amount of RAM (terabytes preferably). Friend of my is buying
for his company a 'mainframe'. Only 4 slow SGI processors (around 200mhz),
which together are the same speed as a single PII-450, however he calls
it a mainframe as there are some tens of gigabytes of RAM at it.

In contradiction to learning, parallel systems have grown up, but they
are right now at a different planet, so rather hard to get, and considering
the time it needed before computer became available to us all, it'll
take some time i fear.

We still didn't discuss the biggest problem: how do we get all that
software properly working parallel? I mean solving the parallel
drinking problem is rather easy (we all simply get a beer and drink
it at the same time), but all the other problems are still to be solved,
last couple of tens of years we only were allowed to think about solutions.

I just was emailed permission to run at a quad system with my program,
to see how it performs at a 4 processors, but how to i figure out how
my program works at 256 or 1024 processors, yes i can't even figure out
how it works at 128 processors...

I mean as long as we have shared memory everything goes fine, but main
problem of the really big parallel systems is that there is no shared
memory either, and the more processors, the more problems when using
shared memory.

I see a large area of problems, which all can fill up a researchers life,
which is exactly something we don't want to do.

We just wanna write our smart program (go, chess, draughts, checkers,
whatever), and let the compiler figure out the rest.

So for the next xx years i think that the speed+memory
advantage of supercomputers will remain, and that PCs will not even
get close to that speed.

Another interesting area is designing your own processor. Considering you
have a cool go-program which has masses of local searches, and which is
cool parallellized, then for a few thousands of dollars, 40,000 dollar
for software, a few years of work, and around 50 dollar a processor, 
and around 125 dollar for a card, you can press your own GO-processor,
which isn't a single go-processor, but a large bunch of go-processor logic
at only 1 chip.

The new technique allows you to press at a board of around 3000 dollar
for 50 dollar your own processor. 

This can't be compared to the speed of Deep Blue processors, as
the technology used for Deep Blue processors is nowadays rather outdated
(only 0.60 micron technology), and at that time it wasn't easy to
press several processors at one chip. It is now.

Some calculations: a go-program getting 50 nodes a second at a quick
PC, can be speeded up around 10,000 times easily at such a processor.

So talking about what is objective the quickest thing, that is still
pressing your own processor. Only few months ago this technology became
available for this little dollars.

The big reason why no one is actually doing it, is of course that after
it is pressed, that you can't bugfix it. Also you won't find much
RAM at those processors. 

The logic is high quality, but making RAM, so hashtables is still rather
difficult.

The speed you search at it is however impressive (assuming you can run
your program parallel). It's easy to press 361 small processors
onto a single go-cpu, each in-logic-go-unit already a hundred times 
faster than the same program running at a PII cpu.

A big disadvantage of this is of course that the guy designing this chip
must be very well, as rewriting C/C++ code to this chip language (there
are some very cool compilers for it, which translate a kind of C code
to chip-logic) still is a very specialistic thing to do.

Further first 100 go-cpus you build 
probably parallellism will not work 
properly, but that's just my guess.

The good news for this go-cpu is that when its finished you might search
deeper in go with it than deep blue used to do in chess (deep blue just
searched the minimum of 11 or 12 ply deep a move, where nowadays 
programs search at least the same depth at the new PII-450), 
which gives the conditions for an interesting human-machine contest.

For Go the branching factor will be around 10 (and not square root 361,
which is 19) and a 10000 fold speed up simply gives 4 ply more, for 
the most stupid searching program, which of course will not hurt :)

>-J Bate
>-----------------------------------------------------------------
>Dr. John A. Bate                   | email: bate@xxxxxxxxxxxxxxxxx
>Dept. of Computer Science          | phone: (204) 474-6791
>University of Manitoba             | FAX:   (204) 474-7609
>Winnipeg, Manitoba, Canada R3T 2N2 |
>-----------------------------------------------------------------

Vincent Diepeveen

- The whole parallel question begun 
  when humans started drinking a parallel beer.
Follow-Ups:
- Cluster Supercomputing
  - From: Darren Cook
- Re: Computer speed
  - From: Rolf Neumann
Prev by Date: Re: Computer speed
Next by Date: Re: Computer speed
Previous by thread: Re: Computer speed
Next by thread: Re: Computer speed
Index(es):
- Date
- Thread