[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Discarding the rubbish moves



William Harold Newman wrote:

> On Wed, Jan 15, 2003 at 11:27:21AM -0600, Richard Brown wrote:

[...]

> > Yes, this is extremely important.  Strong programs of the future will
> > use such a razor in an early phase. I believe that the actual percentage
> > will be higher.  I'll go out on a limb and predict 97%.

[...]

> I guess if you wanted to, maybe you could claim that the move ordering
> which results from heuristic evaluations and iterative deepening in
> alpha-beta search is a reflection of this principle at work in
> computer Chess. But even then, I think it'd be hard to come up with
> anything like a 97% figure for top computer Chess programs. So if you
> want to show that 97% is the right figure for computer Go, you should
> probably have some argument why computer Go follows different
> principles from computer Chess.

Hmmm, I don't really know that much about computer chess, but I know
that some of the results summarized in _A_Probabilistic_Theory_of_
_Pattern_Recognition_ <http://www-cgrl.cs.mcgill.ca/~luc/pattrec.html>
were not obtained until the '90s, so I wonder whether such techniques
have had time to make their way into chess programs, which were
already strong enough without such results.  (I have come to regard
this book as scripture.)

There is a more approachable summary of the same general subject at
<http://www.ph.tn.tudelft.nl/People/bob/papers/pami_00_review.pdf.gz> .
(WinZip can uncompress this file, if you don't have the "real" gzip.)

In any case, my version of computer go follows such principles as
may be found in that book (and others) so rather than argue that it's
different from computer chess, I would argue instead that computer go
is _like_ the many disciplines where such techniques are being used
successfully, already.

See for example: "Neural Network Trigger Algorithms for Heavy Quark
Event Selection in a Fixed Target High Energy Physics Experiment"
at  <http://www-lib.fnal.gov/archive/1991/pub/Pub-91-117.pdf> ,
or "Bayesian Analysis of Linear Phased-Array Radar" at
<http://www.inference.phy.cam.ac.uk/mackay/radar3.pdf>.

The connection of these sorts of things to computer go may not be
immediately obvious, but I assert that there is a connection, and
that it is this:  Data is data.  Er, I mean data are data.  Er...

It may be helpful to think of the pro moves as heavy quark events
(signal), and the moves that pros never make, as light quark events
(noise).  Or maybe not; it might cloud the issue to think of it that
way:  There is not an exact correspondence between quark physics and
go.  But there is _some_ similarity.  And data, whatever its source,
may be mined for such gems as it contains.

There are plenty of examples where similar techniques have been
successful, from modelling the territorial behaviors of badgers and
dung beetles (two entirely separate projects), to a robot that grades
lumber, to...  Well, lots of stuff.

For more examples of how Statistical Pattern Recognition techniques
are successfully used in a _very_ wide variety of fields, I encourage
you to see at least Table 1 from the tudelft.nl URL above.

I personally feel that go _should_ be a simpler thing than particle
physics, or biochemistry.  I mean, come on, it's not rocket science.

But perhaps it's harder!

Anyway, my 97% Wagner number (Wild-Assed Guess, Not Easily Refuted)
simply comes from the respective amounts of good/garbage behaviors
I'm finding in the go data I've collected.

So far.

One program in my toolkit does "Offline training from expert games",
as Markus Enzenberger says he does for NeuroGo, at the URL he posted:

> Mostly the increase in playing strength come from a sophisticated
> network architecture. There is a diagram shown in these slides:
> http://www.cs.ualberta.ca/~emarkus/nngo/nngo-2up.ps

which is very encouraging.  Congratulations, Markus!
 
Without going into the details of exactly how these "behaviors" are
represented -- Shh!  Seekrit!  --  I can say that each such behavior
comprises certain, select features extracted from each move, at each
position in each game.

Well, really, there is a distinct behavior associated with each
_possible_ move at each position in every game.

An "observation" consists of examining not only the behavior that is
indulged in by the pro, but also all the behaviors in which he didn't
indulge.  (Mostly every empty point.)

A pair of integers associated with each observed behavior "b" (whether
indulged in or not) yields an overall "frequency of indulgence" for
that behavior:

                 (number of times behavior was selected by pro)
     F(b)  =  -----------------------------------------------------
              (number of times behavior was available to be played)

Thus, at each obeservation we simply tally one or both of these
numbers, for those data points that correspond to each possible
behavior at that stage of the game.  Lather; rinse; repeat for a
hundred-thousand games or so.

This frequency is not a probability, per se, but it gives a first
guess that's not entirely unreasonable:  A large percentage of the
data points I've collected have a numerator that is zero.  That
doesn't mean that a pro would never play such moves, but a certain
subset of these are seen in _every_ game, and have _never_ been known
to be played by a pro, so those at least we can reasonably certify as
having a zero probability of being played by an expert (or even, in
many cases, by a beginner with only fifty games under his belt).

Those data points with a non-zero numerator represent behaviors that
have actually been indulged in by a pro.  That's not evidence that
they would be good moves in an actual game, of course, but at least
we don't veto them.

The behaviors that _are_ vetoed are the ones with large denominators,
and zero for a numerator:  They are the moves that pros _never_ play.
(I just now read Mr Newman's most recent contribution; yes, "pruning".)

Of course, we must _estimate_ the "frequency of indulgence" when a
"new" behavior appears for the first time.  That's done, for now, by
means of a modified "k nearest neighbor" algorithm, although I hope
to see whether neural nets or some sort of non-linear classifier will
serve as well, because then I can throw away most of the data, and
keep only the weights.

As it turns out, as the process of collecting data goes on, the
percentage of moves in the garbage heap is _converging_ to about 97%.

Wagner number it may be, but I did have some basis for saying just
that, and not 96% or 98%.  (At least it's not a Pooma number.)

I also took heart from what Jan Ramon wrote:

> When learning from existing games, type 2 is easier to learn in some
> sense. One only has to take a bunch of pro games (collections of "good"
> moves) and a set (of approximately the same size) of random moves (which
> are probably bad moves).  Using these (position,move)->value examples, a
> learning program can learn a theory which eliminates most "bad" moves
> without reading ahead.

Yay!  I was hoping that would be the case; glad to hear it.  Thanks,
Jan!
Congratulations also on your recent publication; I expect to struggle
with it for quite some time to come.  (Say, do you remember playing a
game of go with me on the Terrace at the Memorial Union in Madison?
Three, maybe four years ago.)

William Harold Newman again:
> This is one of those places where my net-trained reflex is to admit
> that YMMV. But your mileage would have to vary a *lot* to get to 3%.
> As an AGA 3 dan looking at pro games, I don't feel sure I can quickly
> predict that the next move will be in one of <10 places, which is what
> I'd expect if I were automatically classifying all but 3% (of moves on
> a 19x19 board) as not worth considering.

All but 3% overall, on average...  Can be more than that, or less
than that, at specific different moments in different games.

My playing program seems to be good at picking out the ten or so
candidates, but its lack of go knowledge keeps it from picking the
right ones.  It has a tendency to play "big" moves, rather than urgent
ones.  Even if it does know that the urgent one is not garbage, it
doesn't know that it is urgent.  :-(

So, move ordering on the 3% is something I have yet to figure out.

Don Dailey's recent comments are apropos to that:

> This is just another example  of the law of diminishing returns.  It's
> probably  very easy  to veto  a fairly  high percentage  of  the moves
> correctly, but  getting to those last few  percent becomes enormously,
> and increasingly difficult.

As Mr Newman said:

> Playing one's Go program against any opponent, or just feeding it
> tsume-go problems for that matter, may not *prove* anything, but it
> can be terribly eye-opening.:-|

Amen, Brother!

-- 
Richard L. Brown       Office of Information Services
Unix Guy               University of Wisconsin System Administration