[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: computer-go: Sharing Go modules - ready for download



From: "Mark Boon" <tesuji@xxxxxxxxxxxxxxxxx> wrote:
I'm not following all this, it's way too complicated. You are seeing a
theoretical problem that is easily solved in practice. If there are no
common sub-trees from a certain point, there's no reason at all
to continue maintaining a tree structure.
It appears that you are thinking of "common subtree" as the case of at least two rules involving the same predicate. Since the example involved only 52 predicates total, and 10,000 rules with 15 predicates each, each of those 52 predicates is essentially guaranteed to appear in many different rules, until you have nearly reached the bottom of the decision tree.

What I was referring to was that when there is a common subtree within the decision tree -- a case where two different nodes can safely be given the same child -- you can save some space by doing so. My point was that, for the example I gave, you will not be able to save enough space this way to reduce the space requirements of the decision tree to a reasonable level.

I think it is hard, maybe impossible, to solve my example using less than a couple hundred comparisons with a typical 100 MB PC, even though it is theoretically possible to use less than 52 comparisons if you have enough storage. You may use any algorithm or tree structure you like to disprove this. The example, again, was 10000 rules, each using a random subset of 15 out of 52 possible predicates, either as-is, or negated. Do you disagree with this evaluation, or do you feel it is not practically relevant? I am not trying to be argumentative; I am trying to understand your objection.

Instead you can make a leaf node that contains
a list of the remaining questions. If you think about it you'll find that
this way the tree will contain as many leafs as there are patterns.
This is true as long as you reuse decision subtrees (making it technically a DAG, instead of a tree), and as long as each board matches only one pattern, but it does not solve the space problem.

And in practise the number of internal nodes remains very limited
as well, provided you take good care in which order you insert
the questions in the tree.
If you wrote "usually remains", I would agree. But a million patterns in a thousand different shapes, as I was daydreaming about, is a particularly ornery kind of practise. While I do not expect that these rules would be one-tenth as pathological as the theoretical example, I would expect them to be problematic enough to require something other than a single straightforward decision tree using the theoretical minimum number of comparisons.

For that reason, I will return to the question of alternatives that guarantee reasonable space requirements. One simple approach, that at first blush performs decently at searches, is the following.

Create a tree (or a forest, depending on how you look at it), where each node is associated with a set of predicates. Each non-leaf node has 2 or more children. Each node also inherits the predicates associated with its ancestors, and each node may have one or more rules associated with it. Then, incrementally maintain a tree of shared predicates, as in these examples (capital letters represent predicates):

1. Add FOUR. The tree looks like

FOUR (satisfies the rule named "FOUR")

2. Add SCORE.

/--FU("FOUR")
OR()
\__SCE("SCORE")

Excuse the sloppy figure. FR and SCE are children of the node "OR".

3. Add AND.

/--FU("FOUR")
OR()
\__SCE("SCORE")


AND("AND")

4. Add SEVN. SEVN has no letters in common with OR, so it does not go in the OR subtree. It has one letter in common with AND, though, so it is placed there.


/--FU("FOUR")
OR()
\__SCE("SCORE")


/--AD("AND")
N()
\__SEV("SEVN")

5. Add YEARS, AGO, OR, and ON.

/--FU("FOUR")
__R("OR")
/ \__SCE("SCORE")
O()__
\ \AG("AGO")
\
\
\N("ON")

/--AD("AND")
N()
\__SEV("SEVN")

YEARS("YEARS")

Why does "ON" go under "O" instead of "N" at the root level? Just because "FOUR" was added before "SEVN" was.

Searching against this tree involves depth-first search, stopping whenever you reach a node whose predicates you do not satisfy.

For the nasty 10000/52/15 example, a match attempt using a tree generated with this algorithm required 2500 comparisons. But if every rule involves the same 15 predicates, but in different truth/falsity combinations (10000/15/15, which is the kind of situation you might have if every pattern had the same shape), only 32 comparisons were needed. In general, I believe this approach is better and simpler than my previous suggestion to use separate trees for each pattern shape.

Non-production-quality code (250 lines of Perl) is available. (But I would guess that any algorithm this obvious would have been implemented, and very possibly improved upon, decades ago.)

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com