[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: computer-go: Discarding the rubbish moves



Hi William,

Your comments make enormous sense and I agree with everything.

This is just another example  of the law of diminishing returns.  It's
probably  very easy  to veto  a fairly  high percentage  of  the moves
correctly, but  getting to those last few  percent becomes enormously,
and increasingly difficult.

Another  thing not considered  is scalability.   Typically, if  we are
talking  about  a  global  highly  selective  tree  search,  it's  not
reasonable  to  veto  moves  forever  (unless we  know  with  complete
certainty they are  bad.)  What we want to do is  veto (or select out)
moves based  on their distance  from the leaf  nodes of the  tree.  We
should be  increasingly paranoid  about throwing out  moves as  we get
further from the  leaf nodes.  This guarantee's that  the program will
improve with increasing  depth, there will always be  some depth which
will  consider even  the silliest  of  moves as  the program  searches
deeper and deeper.

If you didn't have this  behavior, your program could not improve with
faster  hardware/deeper searches  because once  you throw  out  a move
critical to understanding the position,  you can never recover even if
you look all the way to the end of the game.

I am struck  by the difficulty of the task.  I  believe it's easier in
Chess to throw out bad moves, but even in chess it has been enormously
difficult  to do  this well.   So difficult  that the  most successful
programs  didn't  even  try  years  ago.  Current  programs  obey  the
tapering rule I talked about,  they are increasingly selective as they
approach the  end nodes  of the search.   Most programs actually  do a
search on the candidate move to  determine if it should be thrown out!
This is  how null move pruning  works in chess  and it's automatically
tapered.  The things selectivity does  miss will get noticed on future
iterations of the search (deeper searches.)

My chess  program uses null  move pruning but  near the leaf  nodes it
does pruning  statically, via  a special kind  of analysis I  do.  The
idea  is  a nice  improvement  over pure  null  move,  but since  it's
actually more  selective than  null move it  can't (and  doesn't) work
well except on the last couple of levels of the tree near the root.  I
have verified  this with hundreds  of thousands of test  games against
various opponents and it's just another example of proper tapering.

The key (and obvious) factor  with selectivity is that it's always bad
in absolute terms, the only thing that saves it is that you can search
deeper.  Surprisingly, you don't need  much extra depth to make up for
the  occasional error  but  you must  find  the right  balance or  the
program will stink.


Don





   Date: Thu, 16 Jan 2003 07:48:24 -0600
   From: William Harold Newman <william.newman@xxxxxxxxxxxxxxxxx>
   Content-Type: text/plain; charset=us-ascii
   Content-Disposition: inline
   User-Agent: Mutt/1.3.25i
   Sender: owner-computer-go@xxxxxxxxxxxxxxxxx
   Precedence: bulk
   Reply-To: computer-go@xxxxxxxxxxxxxxxxx

   On Wed, Jan 15, 2003 at 11:27:21AM -0600, Richard Brown wrote:
   > Darren Cook wrote:
   > > Heikki Levanto wrote:
   > > > Anyway, in any given go position, there are a small number of
   > > > reasonable moves that can be considered, and a great number of
   > > > totally stupid moves that no human would even consider.
   > 
   > > > Any go
   > > > program must be able to discard the 90% of rubbish moves quickly.
   > 
   > Yes, this is extremely important.  Strong programs of the future will
   > use such a razor in an early phase. I believe that the actual percentage
   > will be higher.  I'll go out on a limb and predict 97%.

   This would be extremely handy, but it may not be achievable on the
   timeline which leads to a successful program based on other
   approaches. In computer Chess, the usual shining example of a
   successful software technology for game playing, my understanding is
   that many people have tried really hard over the years to do such
   classification, but the successful programs have largely given up on it.

   I guess if you wanted to, maybe you could claim that the move ordering
   which results from heuristic evaluations and iterative deepening in
   alpha-beta search is a reflection of this principle at work in
   computer Chess. But even then, I think it'd be hard to come up with
   anything like a 97% figure for top computer Chess programs. So if you
   want to show that 97% is the right figure for computer Go, you should
   probably have some argument why computer Go follows different
   principles from computer Chess.

   (I think it does follow different principles in several ways, but 
   that doesn't mean we can't learn from the history of computer Chess.)

   > > > Here a simple move evaluation could make sense. (especially since
   > > > the same move may be totally irrelevant or the best possible,
   > > > depending if there is anything more valuable around...)
   > 
   > There is a clear benefit from a razor which, for any potential move,
   > correctly places it into one of two classes:
   > 
   >         o-  Moves that no reasonable player would ever even consider.
   > 
   >         o-  Other.
   > 
   > For some suitable definition of "reasonable".
   > 
   > The "Other" category comprises ~3%, says I.

   In my (admittedly largely unsuccessful so far) work in computer Go,
   the only large class of moves I've been able to automatically,
   reliably classify as "never even consider" are in the opening, far
   from any other stones and not on the third or fourth line. 

   This is one of those places where my net-trained reflex is to admit
   that YMMV. But your mileage would have to vary a *lot* to get to 3%.
   As an AGA 3 dan looking at pro games, I don't feel sure I can quickly
   predict that the next move will be in one of <10 places, which is what
   I'd expect if I were automatically classifying all but 3% (of moves on
   a 19x19 board) as not worth considering. I doubt computers are going
   to do better than human 3 dans on this until computers play, overall,
   much much better than 3 dans. (I appeal to the computer chess history,
   and general engineering good sense, to justify this claim. We can
   teach computers to do sophisticated classification, but usually what
   they really shine at is exploring many simpler classifications very
   fast.)

   > Assuming that one has a suitable metric, then various techniques can
   > be used (their success depends on the nature of the data).  Neural nets,
   > "linear classifiers", "non-linear classifiers" [just Googling for these
   > last two can be enlightening] and my current favorite, a little thing
   > called the "k nearest neighbor" rule.  "Hopfield networks", "Boltzmann
   > machines", "Voronoi diagrams", "convex hulls", and "Delaunay
   > triangulations"
   > are left as an exercise for the reader who hasn't fallen asleep yet.

   As a human 3 dan, I can very quickly discard many moves even in the
   most complicated middlegame situations. (Maybe 50% of moves?) A few of
   them are locally pointless, like various moves on the first line, or
   moves inside enclosed spaces which simply reduce one's own eyespace.
   But most of them aren't locally pointless, it's just that whatever
   they might accomplish locally is done better by another move, or what
   they accomplish is irrelevant on a larger scale (like making another
   eye for a group which has three eyes already). AFAIK that kind of
   dominated-by-something-else criterion is rather difficult for the
   neural-net-like classification systems you discuss, and the
   recognition that a move's benefits are globally irrelevant is
   extremely difficult.

   After that, I often discard all moves in entire regions, because
   relatively shallow analysis shows that those regions are more settled
   than the important battles elsewhere. By the time I do this, maybe I
   have discarded 90% or more of the moves on the board. But note that
   the process involves feeding a lot of high level global knowledge back
   into the local classification. So if you're looking for algorithms
   which might be as effective as this, I wouldn't look at just local
   classification, but instead try algorithms like those used for speech
   recognition, where there are various techniques to resolve ambiguities
   in local pattern recognition by using higher level information, like
   the way that the sentence structure is parsed.

   > I'm still trying to get my head around the fact that it is possible to
   > extract an _infinite_ amount of information from a finite amount of
   > data.
   > 
   > Ah, well, the nice thing about being an amateur mathematician is that
   > I have nothing to prove.

   Playing one's Go program against any opponent, or just feeding it
   tsume-go problems for that matter, may not *prove* anything, but it
   can be terribly eye-opening.:-|

   -- 
   William Harold Newman <william.newman@xxxxxxxxxxxxxxxxx>
   "That's our advantage at Microsoft; we set the standards and we can
   change them."
    -- Karen Hargrove, Microsoft (quoted in Feb 1993 Unix Review editorial)
   PGP key fingerprint 85 CE 1C BA 79 8D 51 8C  B9 25 FB EE E0 C3 E5 7C