[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Pattern Matcher



From: Peter Drake <drake@xxxxxxxxxxxxxxxxx>
On Saturday, November 6, 2004, at 03:45  PM, Eric Boesch wrote:

Unsurprisingly, many people indicated that they use decision trees for
pattern matching.  Each decision can be based on the value of a single
...
Which intersection should the decision tree test first?  A reasonable
Has anyone tried the information-theoretic choice, as used in automated decision-tree growers like ID3 and C4.5?
My pattern matcher used the information-theoretic value as a secondary
sort criterion for deciding which test to do next, after avoiding
don't-care conditions, which was its primary requirement.  Its
computation probably wasn't as sophisticated as those products (I'm
not familiar with them) -- for example, it assumed that the cost of
all tests was equal, which, now that I think about it, was just lazy.

There are many ways to estimate information-theoretic efficiency of a
test.  If you know the distribution of positions that will be matched
against the patterns, then you can calculate it exactly, but usually
you don't.  I calculated by assuming the distribution of values in the
patterns (excluding don't-care cases) was identical to the
distribution of values in the positions to be matched against those
patterns, but I now prefer worst-case efficiency.  Worst-case
efficiency is crude, but it can be easily applied to partial
don't-care cases (e.g. white or empty but not black), where other
methods I considered sometimes gave counterintuitive results.  So if a
test that takes 1 microsecond is guaranteed to eliminate at least 3/10
of all patterns, then its efficiency is just assumed to be (ln
(7/10)/ln (1/2)) bits of information per microsecond.

Whether test efficiency (bits per microsecond or whatever) or
minimizing don't-care conditions is more important, depends on
circumstances.

If you have plenty of space, then maximizing test efficiency is more
important.  Why not eliminate as many rules as possible, as quickly as
possible, if there is no real penalty to doing so?  Also, some tests
may be so slow that postponing them is almost mandatory.

It may be less obvious that avoiding don't-care is sometimes more
important than maximizing test efficiency, but I think this is also
true.  Moderate inefficiency is pretty harmless -- even if slightly
less efficient tests are consistently done before slightly more
efficient ones thoughout the entire tree, this will only increase
average matching times by a fraction.  In contrast, as I talked about
last time, just a small proportion of patterns that don't care about a
test's result can sometimes ruin pattern-matching decision tree
performance.  If such tests are postponed until later, then the whole
don't-care problem may just go away if, just by luck, other tests
happen to sort this tests's don't-care and do-care cases into separate
subtrees.


_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/