[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] SlugGo approach: GNU vs.Goliath



On 29, Dec 2004, at 10:28 AM, Mark Boon wrote:

David Doshay has reported some significant gains of strength in SlugGo
compared to the original it was based on, GNU Go. I think we haven't seen
this kind of improvement in many, many years, so I think it's an exciting
project he has. What suprised me most is the simplicity of the approach, and
I'm probably not alone in this. It chooses 8 candidate moves, and then reads
those 16 ply ahead with no branches.
Just to be a little more clear, we have tried a number of widths and depths, but
we had great preliminary success at 8 wide and 16 deep, so we then chose
8 as the maximum width for everything we reported. We do not always check
as many as 8 paths, but we always do the same depth.

We need to analyze the parameters that choose the actual final width and take
a close look at how often we choose other than GNU Go's top choice. This is
very high on our "to do" list.

Also, the "no branching" was always a first pass simple thing to do that we
never expected to keep. But it worked so much better than expected we kept
it that way for further inspection. It will soon be replaced by a higher branching
factor.

At some point David wrote:

It is clear that this non-optimal once-branched and then linear
deep lookahead (that we implemented just because it is an easy
way to start) in fact plays better, even if not optimally, for the right
reasons, or even with complete or correct information.
Now, this made me think. Does it really play better for the right reasons?
There seems no doubt it plays much better. Pragmatic as I usually am, it's
maybe less important whether it plays better for the right reasons as long
as it plays better, but I still couldn't help wondering.
We are in agreement, and you have expressed what I was trying to say.

If you had
explained this approach to me before showing the results, I probably would
have given a list of reasons why I would think it wouldn't work.
I can indeed parse my writing either way, but as per my other email, I intended
the "not" to propagate through to all clauses. Sorry for not being as clear as I
might have been ... but at the time I was just way too amused by all the email
telling me why it should not work when I was reporting that it did indeed work.

Some concerns I would have are:

- What about horizon effect?
- What about 'accidents'?
- What about playing urgent moves over big moves? Can you rely purely on
evaluation to determine the best sequence?
- How to maintain consistency of play?

I will briefly explain each of these.
The horizon effect appears when the last move in the sequence is a threat
that can easily be parried. The evaluation will swing in favour of the
player of the last move. Since the parry is never examined, the result of
the first move in the sequence is aversely affected, but for the wrong
reason.
SlugGo clearly has a horizon effect, but I think it is less sensitive than you
imagine. More on this below.

An accident is what I call when a program plays (against itself or other),
and suddenly disaster strikes. Some group gets killed or cut off, some
territory destroyed by an unexpected sequence. 16 moves is long enough to
shift play to an unrelated part of the board and have an accident happen.
Again, this will aversely affect the valuation of the first move in the
sequence for the wrong reason.
I think "accidents" like this could be happening as well, primarily because of
the lack of branching in the lookahead paths. A better move along the path
can eliminate the accident, but that is not seen by SlugGo. But this point is
virtually the same for any Go program, so I do not see it as a SlugGo design
issue, but rather it strikes at the heart of evaluation functions in all cases.

'Urgent moves over big moves' is an important concept in human play. And I
think all Go software have heuristics to ensure urgent moves are chosen over
big moves. They don't rely simply on the evaluated score to decide which
move is better. What I understand from how SlugGo works, is that it does a
simple evaluation of the score to decide which sequence gives the best
result, thereby destroying a large part of the Go knowledge represented by
the heuristics used.
The evaluation used by SlugGo is to combine the GNU Go value of the original
candidate move with 2 other evaluations, one based upon influence and the
other based upon territory, and there is another bonus added when the candidate
move is on both the US and the THEIR list. The influence and territory evaluations
are done only at the end of the 16 move lookahead depth. All 3 evaluations are
normalized prior to being summed, and both the influence and territory evaluations
are also weighted according to move number so influence is important in the
beginning and territory is important at the end (the weighting functions cross
at move 83 right now). So we are combining the GNU Go evaluation of THIS
move with the influence and territory evaluations of the FUTURE board.

By consistent play I mean finishing a sequence, fight, or joseki before
moving elsewhere. Again I think most, if not all, Go software has extensive
code to prevent the program to abandon a fight or joseki half-way and start
playing elsewhere.
SlugGo has this "fault" in spades. It jumps all over the place. But it seems
to me that so do the dan level players that give me a moderate to large
handicap.

SlugGo itself makes no attempt to think about finishing a fight.

These are all reasons (and I could probably think of more) why David's
approach might not give very good results. Yet it obviously does. So I have
to admit there's something here I don't understand.
I feel exactly the same way. I do not understand why SlugGo has worked
as well as it has.

I fully expected to pick up lots of apparent strength against GNU Go because
of the "evil twin" aspect of move generation: I knew exactly what moves my
opponent considered best and I could look ahead in a way I knew it could be
depended upon to play, and just choose the one with the best outcome for me
on a horizon the plain GNU Go could not see. I knew that I would easily beat
the base code.

What I did not expect was that it would work as well as it does against a
different program. I hope that this quarter I will get in a few hundred games
against Go++ to add to those I have against Many Faces of Go.

Can David maybe explain
on what grounds he thinks it plays better for the right reasons?
"Right" or "wrong" I cannot comment upon. I can speculate about why I
think it works and why I think that all of your good reasons above are not
quite as serious as one might first guess.

disclaimer warning: ALL OF IT IS PURE SPECULATION!

I believe that the reason it works better than expected is in the hierarchical
nature of the evaluation. GNU Go is doing plenty of things that attempt to
minimize the effects you mention above and it wraps all of that up into its
single floating point number for the move values. By using many of the top
GNU Go moves I am assuming that there are plenty of great reasons that
these are acceptable moves. But because the specific floating point value
results from an attempt to turn a multidimensional analysis into a single
value, I do not believe that this step cannot be done in a way that is
correct for all situations ... that the algorithm for combining values is
inherently inaccurate. This is not a knock against GNU Go, just a simple
result of all the competing things one tries to do while playing Go and
virtually any algorithmic attempt to reduce it to a single number. So, given
that I trust GNU Go in general but not the exact number, I do whole board
lookahead to further refine the results. But I still carry the original GNU Go
values along for the final evaluation. Also, all the effects you are worrying
about are considered by GNU Go in each of the lookahead moves. It
seems that in most, but surely not all, situations, the influence and territory
values are not that sensitive to the placement of single stones, but
reflect the 16 move sequence rather than overweighting just the last of
the moves.

I do think that we are seeing an implicit evaluation of urgency. The 16
move lookahead sequences are long enough that moves are being played
in several parts of the board. There is no question that many good moves
not played now remain near the top of the list for the rest of the sequence.
SlugGo is looking for the ordered sequence it prefers, and thus it is
investigating urgency implicitly in the lookahead paths. But SlugGo is also
checking for urgency explicitly in every generation of the candidate move
list by first having GNU Go generate moves from each player's perspective.
The primary thing this gives us at every move is an evaluation of the top
threats against us from the opponents perspective. I have trouble parsing
some of Vincent Diepeveen's contributions, but I think this is very close to
what he calls null-move.

Since it's such a simple approach I decided to try a simple experiment
myself. I set up Goliath such that, similarly to SlugGo, it would generate 8
candidate moves. Each of these are then extended 16 ply deep (Basically by
having an instance of Goliath play against itself for 16 moves). It would
pick the move where the score that Goliath gave at the end of the 16 move
sequence was the highest.
This evaluation function is quite different than what SlugGo does. I think
your choice here is surely amplifying the horizon effect.

This only took me a couple of hours to do, the
main work coming from the fact that Goliath doesn't give a list of moves
with their values. It often gives only one move. So I had to have it
generate a move 8 times and add code to prevent the same move to be
generated repeatedly. Obviously this gives a program that is 128 times
slower than the original. Fortunately Goliath stems from a time when
computers were much slower than today, so it still plays at an acceptable
speed even in Java.

So far I have only had time to try it out a few full games. My first
impression is though that it plays very well in the beginning of the game,
but plays much worse later on. I don't think the overall result is an
improvement. I had to spend some time making tools that would make visible
what's going on
This is something we also need to do, but having the computation spread
over a cluster complicates this task. Fortunately, the GNU Go folks have
developed some nice tools, so again, we do not have to start from zero.

so that I could study the generated sequences. This took
considerably more time than the few hours I spent on putting together the
experiment. When examining the generated sequences more closely I could
conclude that:

- The horizon effect happens frequently and is the main cause for the bad
moves.
- The 'accidents' happen much less frequently, but still occasionally result
in a terrible choice of move.
- The program is generally still fairly good at choosing urgent moves over
big ones.
- There definitely is a lack in consistency of play, but so far has not had
as bad an effect as I would have thought.
I could easily say the same things about SlugGo.

On the positive side, there are quite a few cases where the program plays a
much better move than the original program. I have included a game played
against Go++ as an example. Goliath plays Black. Until about move 60,
Black's play is excellent, much better than Goliath would play normally and
it has built a large lead already. But after that it starts to make
occasional big mistakes. Mistakes the original Goliath would never have
made, and these were all caused either by the horizon effect or by an
'accident' elsewhere in the sequences. The end-result is disastrous. Other
games I tried so far show a similar development.

It's still early to draw hard conclusions about this experiment. And after
all, the few hours I've spent to put this experiment together in no way
compares to the effort David has put into his SluGo project. But I'm still
inclined to thinking a couple of things:

- The horizon effect does seem to be a big problem. How come this is not a
problem in SluGo?
Fixing the bad moves SlugGo will make but GNU Go will not make has taken
a remarkable amount of time. Still an ongoing problem. Our horizon problems
have always been the worst in the end game.

- Some of the problems I saw concerning urgent moves appear to be largely
unfounded.
As I explained above, I think urgency is something SlugGo does better than
GNU Go. Clipping from Gunnar's comments:

* Some moves are very solid but quite slow. It's debatable whether
those really make the program stronger but it's probably effective
against many computer programs and possibly in handicap games
against humans. In terms of style those moves feel similar to Go++.

* It has a stronger tendency to take care of weak dragons. The use
of moves generated for the opponent seems to be part of the reason.

* It also has a stronger tendency to keep pressure on weak opponent
dragons. I wouldn't say that it does this very systematically but
it gets in useful moves often enough to make a difference.

I think these all come down to a better evaluation of urgency, or at least
a greater respect for the opponent's threats.

- It often plays much better moves than the original, so there does seem to
be potential to the approach.

Another thing I was wondering is if GNU Go is somehow much more suitable to
an approach like this than Goliath. This doesn't take anything away from
SlugGo of course, if it works, it works. Academically I could see this as a
problem though. Did the success of SlugGo depend on the choice of Go engine?
Would the result have been different if another Go engine had been available
and it had been chosen instead? Or does SlugGo have some solution(s)
implemented that address the horizon effect? Or maybe Goliath is just too
weak or peculiar and unreliable for this kind of approach? After all it's a
very old program, badly written, and I often think myself it's a miracle it
plays as well as it does.
Well, we originally chose GNU Go simply because it was open source. What
works for us so well is that a floating point value is generated for a number of
possible moves, and that we can ask for a set of moves for each side every
time it is our turn. It would not be that hard to integrate any move generation
engine into SlugGo if it had those two useful attributes. SlugGo does use GNU
Go's influence and territory functions, and suitable substitutes would have to
be found.

But back to speculation, a while ago one of the GNU Go developers wrote a
very nice description of how GNU Go works, and it made it sound so very
hierarchical, involving a series of evaluations at different length scales across
the board. From that perspective, all SlugGo does is add the one length that
GNU Go leaves off: whole board lookahead and evaluation.

As mentioned previously, my interest is in multi-scale physics problems that
might be solved with a hierarchical approach. Go is a great play pen for
developing and debugging the cluster infrastructure I want to apply to those
problems, and because Go is an acceptable Computer Science project, I
can use CS students to develop the infrastructure, and have physics students
focus on physics problems.

And it sure is fun!


Cheers and happy new year,
David


_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/