[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Re: Sharing Secrets



On Thu, Oct 21, 2004 at 02:17:41PM +0200, John Tromp wrote:
> Most programmers conveniently ignore the issue. And they might as well.
> Even if they store 2^24 positions in a hashtable, and do a search of 2^24
> positions, the collision probability is at most 2^(24+24-64) = 1/65536,
> which can be considered negligable.
> 
> They will start thinking about the issue in 10 years when they store 2^30
> positions and search 2^30, bumping the collision probability to 1/16.
> At that time, moving to 96-bit hashes seems the easy solution...
> 

Just a little mathematical warning -- hash clashes are a lot more
likely than you might naively expect, even if your hashes are perfectly
randomly distributed. The classic example of this is the 'Birthday
Problem' that comes up in intro combinatorics classes.

The problem : Suppose you have a class of 50 students. Assuming their
birthdays are randomly distributed throughout the year, what is the
probability that two people share a birthday?

You might figure that since there are 365 days of the year and only
50 students that the answer would be rather low, but in fact the correct
answer is somewhere around 97%.

Now this is just a hashing problem wrapped up in everyday terms --
we have a hash space with 365 values, and we're assigning a hash value
(their birthday) to each of the 50 students. The same phenomena occurs
when using a hash space with 2**64 values; inserting 2**24 positions
would have a very low chance of clash, but not nearly as low as you
might first guess (according to a little program I threw together,
the chance of a clash on 2**24 positions is around 7.6 in a million)
-- and adding in another 2**24 positions from a search makes a clash
virtually a certainty (though I haven't set my computer to calculate
the actual percentage, it should be much higher that 1 in 65536).

Of course, all of this is assuming the hash function randomly
distributes positions; if it doesn't, the chance of a hash clash would
be higher. In particular, a randomly generated Zobrist hash might or
might not randomly distribute positions.

Jeff Raven
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/