[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] SlugGo approach: GNU vs.Goliath



At 15:48 6-1-2005 +0100, A van Kessel wrote:
>>>>  massive roundoff errors.
>I don't think massive roundoff errors are a major cause
>for evaluation functions to fail. [Eg mis-evaluating a dead stone
>should _at_least_ cause you to be off by more than one (or two) points)
>
>> It is relevant. It's 2 times slower at least.
>Depends on the platform, IMHO. for platforms with separate
>floating point boxes ('units') such as the alpha (maybe newer pentiums ?)
>the float ops *could* be done in parallel.
>It's just an engineering decision, I guess.

And i don't doubt that somewhere in the galaxy the shortest way to reach
your neighbour is also first fly around the planet.

No normal user has alpha at home. Those are a bit outdated anyway as they
just run at at most 1.3Ghz and can do a maximum of 4 instructions per cycle
and roughly only 2 integer instructions a cycle which just sucks ass. That
for say $20000 a box.

At home i have a very old by now k7 and it runs 2.127Ghz and it can execute
at 3 instruction a cycle from which 3 integer instructions. 

But don't ask after its floating point performance.

Now we didn't discuss the P4 yet, the best sold PC-processor on the planet,
and by far the weakest in floating point of all processors sold right now.

Certain floating point instructions eat hundreds of cycles, where on the
paper it could execute integers at 4 instructions a cycle (reality is there
is limitations that cause it to be limited to 3 or even less).

Note it runs now at 3.8Ghz and to see how good or bad it is in integer code
such as 'diep' see for example :

http://www.sudhian.com/showdocs.cfm?aid=635&pid=2403

Also at aceshardware a year ago there were several tests done with diep at
the different processors.

So where even the most cheapo lowend processor can do up to 3 instructions
a cycle handsdown, the most expensive floating point platforms currently
sold are doing about 2 to 3 flops. The IBM supercube which sold for unknown
price,  and where roughly 2048 processors are inside 1 cube, it clocks at
about 700Mhz and can do about 2 flops per cycle. The latest machine
installed with it has roughly 32768 of those processors and they just
unveiled also a version which should have roughly over 200000 of those
processors delivering in total 380 teraflop.

In the highend only the number of flops counts.

A flop is however not very clear whether it's 32 bits or 64 bits. I would
argue 64 bits is more accurate, but the compilers optimize it as 32 bits by
default and that's what gets measured in those benchmarks.

A double is *not* 64 bits.

It's not even close to 64 bits.

That's why SSE2 is there. It can do 2 floating point multiplies in 1
instruction and it can execute 1 instruction each 2 cycles or so.

So basically it's 0.5 instruction per cycle.

That's still about factor 6 slower than integer code can work.

Yet there is a processor, clocked at a very happy 700Mhz which can do 2
flops per cycle. Hiep hiep hooray.

So there is no need to convert your pathetic code to integer code, as it
might run not slower in floating point at a billion dollar supercomputer in
Los Alamos.

Note it is pretty busy calculating nuclear explosions, so even the odds you
get system time there is zero.

Vincent

>AvK
>_______________________________________________
>computer-go mailing list
>computer-go@xxxxxxxxxxxxxxxxx
>http://www.computer-go.org/mailman/listinfo/computer-go/
>
>
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/