[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] SlugGo approach: GNU vs.Goliath
On 29, Dec 2004, at 10:28 AM, Mark Boon wrote:
David Doshay has reported some significant gains of strength in SlugGo
compared to the original it was based on, GNU Go. I think we haven't
seen
this kind of improvement in many, many years, so I think it's an
exciting
project he has. What suprised me most is the simplicity of the
approach, and
I'm probably not alone in this. It chooses 8 candidate moves, and then
reads
those 16 ply ahead with no branches.
Just to be a little more clear, we have tried a number of widths and
depths, but
we had great preliminary success at 8 wide and 16 deep, so we then chose
8 as the maximum width for everything we reported. We do not always
check
as many as 8 paths, but we always do the same depth.
We need to analyze the parameters that choose the actual final width
and take
a close look at how often we choose other than GNU Go's top choice.
This is
very high on our "to do" list.
Also, the "no branching" was always a first pass simple thing to do
that we
never expected to keep. But it worked so much better than expected we
kept
it that way for further inspection. It will soon be replaced by a
higher branching
factor.
At some point David wrote:
It is clear that this non-optimal once-branched and then linear
deep lookahead (that we implemented just because it is an easy
way to start) in fact plays better, even if not optimally, for the
right
reasons, or even with complete or correct information.
Now, this made me think. Does it really play better for the right
reasons?
There seems no doubt it plays much better. Pragmatic as I usually am,
it's
maybe less important whether it plays better for the right reasons as
long
as it plays better, but I still couldn't help wondering.
We are in agreement, and you have expressed what I was trying to say.
If you had
explained this approach to me before showing the results, I probably
would
have given a list of reasons why I would think it wouldn't work.
I can indeed parse my writing either way, but as per my other email, I
intended
the "not" to propagate through to all clauses. Sorry for not being as
clear as I
might have been ... but at the time I was just way too amused by all
the email
telling me why it should not work when I was reporting that it did
indeed work.
Some concerns I would have are:
- What about horizon effect?
- What about 'accidents'?
- What about playing urgent moves over big moves? Can you rely purely
on
evaluation to determine the best sequence?
- How to maintain consistency of play?
I will briefly explain each of these.
The horizon effect appears when the last move in the sequence is a
threat
that can easily be parried. The evaluation will swing in favour of the
player of the last move. Since the parry is never examined, the result
of
the first move in the sequence is aversely affected, but for the wrong
reason.
SlugGo clearly has a horizon effect, but I think it is less sensitive
than you
imagine. More on this below.
An accident is what I call when a program plays (against itself or
other),
and suddenly disaster strikes. Some group gets killed or cut off, some
territory destroyed by an unexpected sequence. 16 moves is long enough
to
shift play to an unrelated part of the board and have an accident
happen.
Again, this will aversely affect the valuation of the first move in the
sequence for the wrong reason.
I think "accidents" like this could be happening as well, primarily
because of
the lack of branching in the lookahead paths. A better move along the
path
can eliminate the accident, but that is not seen by SlugGo. But this
point is
virtually the same for any Go program, so I do not see it as a SlugGo
design
issue, but rather it strikes at the heart of evaluation functions in
all cases.
'Urgent moves over big moves' is an important concept in human play.
And I
think all Go software have heuristics to ensure urgent moves are
chosen over
big moves. They don't rely simply on the evaluated score to decide
which
move is better. What I understand from how SlugGo works, is that it
does a
simple evaluation of the score to decide which sequence gives the best
result, thereby destroying a large part of the Go knowledge
represented by
the heuristics used.
The evaluation used by SlugGo is to combine the GNU Go value of the
original
candidate move with 2 other evaluations, one based upon influence and
the
other based upon territory, and there is another bonus added when the
candidate
move is on both the US and the THEIR list. The influence and territory
evaluations
are done only at the end of the 16 move lookahead depth. All 3
evaluations are
normalized prior to being summed, and both the influence and territory
evaluations
are also weighted according to move number so influence is important in
the
beginning and territory is important at the end (the weighting
functions cross
at move 83 right now). So we are combining the GNU Go evaluation of THIS
move with the influence and territory evaluations of the FUTURE board.
By consistent play I mean finishing a sequence, fight, or joseki before
moving elsewhere. Again I think most, if not all, Go software has
extensive
code to prevent the program to abandon a fight or joseki half-way and
start
playing elsewhere.
SlugGo has this "fault" in spades. It jumps all over the place. But it
seems
to me that so do the dan level players that give me a moderate to large
handicap.
SlugGo itself makes no attempt to think about finishing a fight.
These are all reasons (and I could probably think of more) why David's
approach might not give very good results. Yet it obviously does. So I
have
to admit there's something here I don't understand.
I feel exactly the same way. I do not understand why SlugGo has worked
as well as it has.
I fully expected to pick up lots of apparent strength against GNU Go
because
of the "evil twin" aspect of move generation: I knew exactly what moves
my
opponent considered best and I could look ahead in a way I knew it
could be
depended upon to play, and just choose the one with the best outcome
for me
on a horizon the plain GNU Go could not see. I knew that I would easily
beat
the base code.
What I did not expect was that it would work as well as it does against
a
different program. I hope that this quarter I will get in a few hundred
games
against Go++ to add to those I have against Many Faces of Go.
Can David maybe explain
on what grounds he thinks it plays better for the right reasons?
"Right" or "wrong" I cannot comment upon. I can speculate about why I
think it works and why I think that all of your good reasons above are
not
quite as serious as one might first guess.
disclaimer warning: ALL OF IT IS PURE SPECULATION!
I believe that the reason it works better than expected is in the
hierarchical
nature of the evaluation. GNU Go is doing plenty of things that attempt
to
minimize the effects you mention above and it wraps all of that up into
its
single floating point number for the move values. By using many of the
top
GNU Go moves I am assuming that there are plenty of great reasons that
these are acceptable moves. But because the specific floating point
value
results from an attempt to turn a multidimensional analysis into a
single
value, I do not believe that this step cannot be done in a way that is
correct for all situations ... that the algorithm for combining values
is
inherently inaccurate. This is not a knock against GNU Go, just a simple
result of all the competing things one tries to do while playing Go and
virtually any algorithmic attempt to reduce it to a single number. So,
given
that I trust GNU Go in general but not the exact number, I do whole
board
lookahead to further refine the results. But I still carry the original
GNU Go
values along for the final evaluation. Also, all the effects you are
worrying
about are considered by GNU Go in each of the lookahead moves. It
seems that in most, but surely not all, situations, the influence and
territory
values are not that sensitive to the placement of single stones, but
reflect the 16 move sequence rather than overweighting just the last of
the moves.
I do think that we are seeing an implicit evaluation of urgency. The 16
move lookahead sequences are long enough that moves are being played
in several parts of the board. There is no question that many good moves
not played now remain near the top of the list for the rest of the
sequence.
SlugGo is looking for the ordered sequence it prefers, and thus it is
investigating urgency implicitly in the lookahead paths. But SlugGo is
also
checking for urgency explicitly in every generation of the candidate
move
list by first having GNU Go generate moves from each player's
perspective.
The primary thing this gives us at every move is an evaluation of the
top
threats against us from the opponents perspective. I have trouble
parsing
some of Vincent Diepeveen's contributions, but I think this is very
close to
what he calls null-move.
Since it's such a simple approach I decided to try a simple experiment
myself. I set up Goliath such that, similarly to SlugGo, it would
generate 8
candidate moves. Each of these are then extended 16 ply deep
(Basically by
having an instance of Goliath play against itself for 16 moves). It
would
pick the move where the score that Goliath gave at the end of the 16
move
sequence was the highest.
This evaluation function is quite different than what SlugGo does. I
think
your choice here is surely amplifying the horizon effect.
This only took me a couple of hours to do, the
main work coming from the fact that Goliath doesn't give a list of
moves
with their values. It often gives only one move. So I had to have it
generate a move 8 times and add code to prevent the same move to be
generated repeatedly. Obviously this gives a program that is 128 times
slower than the original. Fortunately Goliath stems from a time when
computers were much slower than today, so it still plays at an
acceptable
speed even in Java.
So far I have only had time to try it out a few full games. My first
impression is though that it plays very well in the beginning of the
game,
but plays much worse later on. I don't think the overall result is an
improvement. I had to spend some time making tools that would make
visible
what's going on
This is something we also need to do, but having the computation spread
over a cluster complicates this task. Fortunately, the GNU Go folks have
developed some nice tools, so again, we do not have to start from zero.
so that I could study the generated sequences. This took
considerably more time than the few hours I spent on putting together
the
experiment. When examining the generated sequences more closely I could
conclude that:
- The horizon effect happens frequently and is the main cause for the
bad
moves.
- The 'accidents' happen much less frequently, but still occasionally
result
in a terrible choice of move.
- The program is generally still fairly good at choosing urgent moves
over
big ones.
- There definitely is a lack in consistency of play, but so far has
not had
as bad an effect as I would have thought.
I could easily say the same things about SlugGo.
On the positive side, there are quite a few cases where the program
plays a
much better move than the original program. I have included a game
played
against Go++ as an example. Goliath plays Black. Until about move 60,
Black's play is excellent, much better than Goliath would play
normally and
it has built a large lead already. But after that it starts to make
occasional big mistakes. Mistakes the original Goliath would never have
made, and these were all caused either by the horizon effect or by an
'accident' elsewhere in the sequences. The end-result is disastrous.
Other
games I tried so far show a similar development.
It's still early to draw hard conclusions about this experiment. And
after
all, the few hours I've spent to put this experiment together in no way
compares to the effort David has put into his SluGo project. But I'm
still
inclined to thinking a couple of things:
- The horizon effect does seem to be a big problem. How come this is
not a
problem in SluGo?
Fixing the bad moves SlugGo will make but GNU Go will not make has taken
a remarkable amount of time. Still an ongoing problem. Our horizon
problems
have always been the worst in the end game.
- Some of the problems I saw concerning urgent moves appear to be
largely
unfounded.
As I explained above, I think urgency is something SlugGo does better
than
GNU Go. Clipping from Gunnar's comments:
* Some moves are very solid but quite slow. It's debatable whether
those really make the program stronger but it's probably effective
against many computer programs and possibly in handicap games
against humans. In terms of style those moves feel similar to Go++.
* It has a stronger tendency to take care of weak dragons. The use
of moves generated for the opponent seems to be part of the reason.
* It also has a stronger tendency to keep pressure on weak opponent
dragons. I wouldn't say that it does this very systematically but
it gets in useful moves often enough to make a difference.
I think these all come down to a better evaluation of urgency, or at
least
a greater respect for the opponent's threats.
- It often plays much better moves than the original, so there does
seem to
be potential to the approach.
Another thing I was wondering is if GNU Go is somehow much more
suitable to
an approach like this than Goliath. This doesn't take anything away
from
SlugGo of course, if it works, it works. Academically I could see this
as a
problem though. Did the success of SlugGo depend on the choice of Go
engine?
Would the result have been different if another Go engine had been
available
and it had been chosen instead? Or does SlugGo have some solution(s)
implemented that address the horizon effect? Or maybe Goliath is just
too
weak or peculiar and unreliable for this kind of approach? After all
it's a
very old program, badly written, and I often think myself it's a
miracle it
plays as well as it does.
Well, we originally chose GNU Go simply because it was open source. What
works for us so well is that a floating point value is generated for a
number of
possible moves, and that we can ask for a set of moves for each side
every
time it is our turn. It would not be that hard to integrate any move
generation
engine into SlugGo if it had those two useful attributes. SlugGo does
use GNU
Go's influence and territory functions, and suitable substitutes would
have to
be found.
But back to speculation, a while ago one of the GNU Go developers wrote
a
very nice description of how GNU Go works, and it made it sound so very
hierarchical, involving a series of evaluations at different length
scales across
the board. From that perspective, all SlugGo does is add the one length
that
GNU Go leaves off: whole board lookahead and evaluation.
As mentioned previously, my interest is in multi-scale physics problems
that
might be solved with a hierarchical approach. Go is a great play pen for
developing and debugging the cluster infrastructure I want to apply to
those
problems, and because Go is an acceptable Computer Science project, I
can use CS students to develop the infrastructure, and have physics
students
focus on physics problems.
And it sure is fun!
Cheers and happy new year,
David
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/