On Thu, Oct 21, 2004 at 02:17:41PM +0200, John Tromp wrote:
Most programmers conveniently ignore the issue. And they might as well.
Even if they store 2^24 positions in a hashtable, and do a search of 2^24
positions, the collision probability is at most 2^(24+24-64) = 1/65536,
which can be considered negligable.
They will start thinking about the issue in 10 years when they store 2^30
positions and search 2^30, bumping the collision probability to 1/16.
At that time, moving to 96-bit hashes seems the easy solution...
Just a little mathematical warning -- hash clashes are a lot more
likely than you might naively expect, even if your hashes are perfectly
randomly distributed. The classic example of this is the 'Birthday
Problem' that comes up in intro combinatorics classes.
Now this is just a hashing problem wrapped up in everyday terms --
we have a hash space with 365 values, and we're assigning a hash value
(their birthday) to each of the 50 students. The same phenomena occurs
when using a hash space with 2**64 values; inserting 2**24 positions
would have a very low chance of clash, but not nearly as low as you
might first guess (according to a little program I threw together,
the chance of a clash on 2**24 positions is around 7.6 in a million)
-- and adding in another 2**24 positions from a search makes a clash
virtually a certainty (though I haven't set my computer to calculate
the actual percentage, it should be much higher that 1 in 65536).
No Jeff, it really is less than 2^{-16}, under the mere assumption that