[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] Pattern Matcher
Vincent Diepeveen wrote:
This connect4 has the most idiotic and slow hashtable implementation i have
ever seen in my life.
It was optimized for space, so that it may benefit from L2 cache.
It's using only 5 byte per entry, including perfect collision detection.
How many bytes per entry did yours use?
He's using the SLOWEST possible integer instructions on the processor for
what is it 5 times or so just for 1 hashtable lookup or so?
It uses 2 modulo operations per lookup.
modulo and divide are like a 46+ cycles at opteron, and like 200 cycles or
so at a P4?
What can I say, P4s suck:-(
Alpha CPUs could implement % constant with a 64x64->high64 multiplications
of only a few cycles.
Additionally why not use a more clever hashtable probing system?
My connect4 had a more clever one back in 1995 already or so.
Are you all talk and no show?
Present me your code and I'll be happy to compare the two.
If you can't optimize for such SIMPLISTIC details, you sure must code the
rest of your life JAVA.
I rather leave the compiler to do the optimizations for me, instead of
rewriting my code for every new CPU that comes out.
> Note that this c4 program proofs really nothing. It's just cache trashing
>and main memory trashing.
> 99% of the system time goes to idiotic slow idiv instructions and memory
> lookups. Usually modulo is in hardware casted to idiv (well it is at P4
> where it is casted to floating point unit if i remember well).
Note that Vincent proves really nothing. It's just trash talking.
I can provide not only full source, but profiling data as well:
rank self accum count trace method
1 56.11% 56.11% 7321056 140 SearchGame.ab
2 16.33% 72.45% 3779991 166 TransGame.transpose
3 8.66% 81.11% 3779998 126 TransGame.hash
4 6.52% 87.63% 2855890 173 TransGame.hash
5 6.00% 93.62% 2608860 99 Game.makemove
6 5.79% 99.41% 2608860 79 Game.backmove
regards,
-John
PS: it does seem C compilers have improved lately:
gcc -O gets
95994066 pos / 46619 msec = 2059.1 Kpos/sec
while IBMJava2-142 gets
95994066 pos / 72105 msec = 1331.3 Kpos/sec
on solving the position 4443333 on my AMD Athlon(tm) XP 2700+ machine.
So, C is almost 55% faster. More than the 25% gap I noticed a few years
back, but nothing like the "more than 200% gap" Vincent would have you
believe.
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/