Again I say "I'm not following all this, it's way too complicated.".
The discussion started about a pattern-matcher for Go. You are talking about
a predicate-matcher, which is an interesting topic as well but not very
relevant to the initial discussion.
In case of Go, there are only four possibilities at each point: empty,
black, white or no-care.
When implementing a pattern matcher as some kind of
(decision-) tree, it's the no-care points that are causing the problem in
terms of tree-size and search time. If the set of patterns to be put in the
tree is treated properly however, the size of the tree doesn't have to
become big. What you need to do is to analyze all the patterns and adjust
them in such a way that the first common no-care point is as deep in the
tree as possible. You continue to do this at each step recursively, when the
set of patterns gets split into four sub-sets.
When adding patterns later, it becomes more difficult to maintain an
efficient tree. This can besolved however by completely recomputing the tree
using the whole set of patterns. That is a relatively expensive thing to do
of course, but only needed occasionally when the tree grows significantly in
size. Theoretically it's still possible to come up with a set of patterns
which will give a very inefficient tree, but in practise the tree remains
small and efficient even for quite a large number of patterns.
> -----Original Message-----
> From: owner-computer-go@xxxxxxxxxxxxxxxxx
> [mailto:owner-computer-go@xxxxxxxxxxxxxxxxx]On Behalf Of Eric Boesch
> Sent: Friday, June 22, 2001 8:04 AM
> To: computer-go@xxxxxxxxxxxxxxxxx
> Subject: RE: computer-go: Sharing Go modules - ready for download
>
>
> From: "Mark Boon" <tesuji@xxxxxxxxxxxxxxxxx> wrote:
> >I'm not following all this, it's way too complicated. You are seeing a
> >theoretical problem that is easily solved in practice. If there are no
> >common sub-trees from a certain point, there's no reason at all
> >to continue maintaining a tree structure.
>
> It appears that you are thinking of "common subtree" as the case
> of at least
> two rules involving the same predicate. Since the example
> involved only 52
> predicates total, and 10,000 rules with 15 predicates each, each
> of those 52
> predicates is essentially guaranteed to appear in many different rules,
> until you have nearly reached the bottom of the decision tree.
>
> What I was referring to was that when there is a common subtree
> within the
> decision tree -- a case where two different nodes can safely be given the
> same child -- you can save some space by doing so. My point was
> that, for
> the example I gave, you will not be able to save enough space this way to
> reduce the space requirements of the decision tree to a reasonable level.
>
> I think it is hard, maybe impossible, to solve my example using
> less than a
> couple hundred comparisons with a typical 100 MB PC, even though it is
> theoretically possible to use less than 52 comparisons if you have enough
> storage. You may use any algorithm or tree structure you like to
> disprove
> this. The example, again, was 10000 rules, each using a random
> subset of 15
> out of 52 possible predicates, either as-is, or negated. Do you disagree
> with this evaluation, or do you feel it is not practically
> relevant? I am
> not trying to be argumentative; I am trying to understand your objection.
>
> >Instead you can make a leaf node that contains
> >a list of the remaining questions. If you think about it you'll find that
> >this way the tree will contain as many leafs as there are patterns.
>
> This is true as long as you reuse decision subtrees (making it
> technically a
> DAG, instead of a tree), and as long as each board matches only
> one pattern,
> but it does not solve the space problem.
>
> >And in practise the number of internal nodes remains very limited
> >as well, provided you take good care in which order you insert
> >the questions in the tree.
>
> If you wrote "usually remains", I would agree. But a million
> patterns in a
> thousand different shapes, as I was daydreaming about, is a particularly
> ornery kind of practise. While I do not expect that these rules would be
> one-tenth as pathological as the theoretical example, I would
> expect them to
> be problematic enough to require something other than a single
> straightforward decision tree using the theoretical minimum number of
> comparisons.
>
> For that reason, I will return to the question of alternatives that
> guarantee reasonable space requirements. One simple approach,
> that at first
> blush performs decently at searches, is the following.
>
> Create a tree (or a forest, depending on how you look at it), where each
> node is associated with a set of predicates. Each non-leaf node has 2 or
> more children. Each node also inherits the predicates associated
> with its
> ancestors, and each node may have one or more rules associated with it.
> Then, incrementally maintain a tree of shared predicates, as in these
> examples (capital letters represent predicates):
>
> 1. Add FOUR. The tree looks like
>
> FOUR (satisfies the rule named "FOUR")
>
> 2. Add SCORE.
>
> /--FU("FOUR")
> OR()
> \__SCE("SCORE")
>
> Excuse the sloppy figure. FR and SCE are children of the node "OR".
>
> 3. Add AND.
>
> /--FU("FOUR")
> OR()
> \__SCE("SCORE")
>
>
> AND("AND")
>
> 4. Add SEVN. SEVN has no letters in common with OR, so it does
> not go in
> the OR subtree. It has one letter in common with AND, though, so it is
> placed there.
>
>
> /--FU("FOUR")
> OR()
> \__SCE("SCORE")
>
>
> /--AD("AND")
> N()
> \__SEV("SEVN")
>
> 5. Add YEARS, AGO, OR, and ON.
>
> /--FU("FOUR")
> __R("OR")
> / \__SCE("SCORE")
> O()__
> \ \AG("AGO")
> \
> \
> \N("ON")
>
> /--AD("AND")
> N()
> \__SEV("SEVN")
>
> YEARS("YEARS")
>
> Why does "ON" go under "O" instead of "N" at the root level?
> Just because
> "FOUR" was added before "SEVN" was.
>
> Searching against this tree involves depth-first search, stopping
> whenever
> you reach a node whose predicates you do not satisfy.
>
> For the nasty 10000/52/15 example, a match attempt using a tree generated
> with this algorithm required 2500 comparisons. But if every rule
> involves
> the same 15 predicates, but in different truth/falsity combinations
> (10000/15/15, which is the kind of situation you might have if
> every pattern
> had the same shape), only 32 comparisons were needed. In
> general, I believe
> this approach is better and simpler than my previous suggestion to use
> separate trees for each pattern shape.
>
> Non-production-quality code (250 lines of Perl) is available.
> (But I would
> guess that any algorithm this obvious would have been
> implemented, and very
> possibly improved upon, decades ago.)
>
> _________________________________________________________________
> Get your FREE download of MSN Explorer at http://explorer.msn.com
>
>