[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Sharing Go modules - ready for download



Patrick Stiegeler <stiegele@xxxxxxxxxxxxxxxxx> wrote:
a propos pattern matcher: i intend to write one, too. but i'm still
thinking about what data structures to use to speed up search. would
you tell me, which ones you use and how many comparisons you need to
identify all patterns (out of a set of interesting ones) that are
on the board?
the question is also open for the rest of the group: does
anyone know about such 2 dimensional pattern matching algorithms
that need nearly as little as possible comparisons?
The tree approach others have mentioned seems like a good place to start. It will use practically the minimum possible comparisons, but it may require impossible amounts of space if you are unlucky.

If you have few rules, or the rules all have the same shape and each rule has at most a couple don't-care conditions, then a tree will work.

At the other extreme, here is a bad case that cannot be handled with a tree, on account of having don't-care conditions heavily and uniformly spread throughout the rules.

Suppose you have 10,000 rules, R1 through R10000. Each rule involves 15 random predicates or their negations out of a set of 52 predicates, labeled A-Z and a-z. For example:

If A,C,G,g,p,r, and u are false, and D,F,H,f,h,m,s, and y are true, then rule P5332 is satisfied.

Since the predicates in the rules are randomly distributed, it hardly matters which predicate you use to start your binary search with. You could just as well use alphabetical order:

#1 Is A true? If yes, go to #2. If no, go to #55223423321.
#2 Is B true? If yes, go to #3. If no, go to #25234234323.
...
#99442343132 Rule 5332 is satisfied.

After answering at most 52 questions, and probably a little fewer, you will know which rules have been satisfied. The problem is that even if you merge common subtrees, this will only help trim a few nodes at the bottom of the tree. Further from the leaves -- after 40 questions, for example -- there will be essentially no common subtrees, meaning the tree will have spread out to 2**40 different nodes, which is impractical.

If it were possible, I would like to be very greedy in my mostly imaginary go program, dynamically storing as many patterns as RAM permits, which ought to be millions. Unfortunately, efficient rule insertion and rule search may not be possible at that scale, if the rule shapes may vary endlessly.

I am not so familiar with the "Match a single document/go board against many rules/queries" case discussed here, but I am familiar with its inverse, the "Match a single rule/query against many documents" case that boolean text search involves. In the latter case, full-generality logarithmic performance is a pipe dream. My suspicion is that the same is true in the former case.

A straightforward compromise that will give good insertion times and compact storage, and may give decent lookup times, is to dynamically maintain a list (perhaps a hash table, in practice) of all distinct rule shapes, and to use binary search (or trinary search for white-black-vacant) within each shape. This assumes that all rules have few or no don't-care squares within the rule shape. This approach may be much faster than attempting explicit matches against every possible rule in sequence, since the number of comparisons varies with the number of different rule shapes, instead of the number of different rules.

How does Gnu Go handle this?

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com