[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[computer-go] SlugGo approach: GNU vs.Goliath



David Doshay has reported some significant gains of strength in SlugGo
compared to the original it was based on, GNU Go. I think we haven't seen
this kind of improvement in many, many years, so I think it's an exciting
project he has. What suprised me most is the simplicity of the approach, and
I'm probably not alone in this. It chooses 8 candidate moves, and then reads
those 16 ply ahead with no branches.

The fact that he needs tremendous computing power doesn't bother me at all.
Despite enormous increase of CPU performance the last decades, Go software
has hardly made any use of this power that is available. Gain in playing
level justifies almost any extra cost in performance you need. It will most
likely be compensated by increases in CPU power in the near future.

At some point David wrote:

> It is clear that this non-optimal once-branched and then linear
> deep lookahead (that we implemented just because it is an easy
> way to start) in fact plays better, even if not optimally, for the right
> reasons, or even with complete or correct information.

Now, this made me think. Does it really play better for the right reasons?
There seems no doubt it plays much better. Pragmatic as I usually am, it's
maybe less important whether it plays better for the right reasons as long
as it plays better, but I still couldn't help wondering. If you had
explained this approach to me before showing the results, I probably would
have given a list of reasons why I would think it wouldn't work. Some
concerns I would have are:

- What about horizon effect?
- What about 'accidents'?
- What about playing urgent moves over big moves? Can you rely purely on
evaluation to determine the best sequence?
- How to maintain consistency of play?

I will briefly explain each of these.
The horizon effect appears when the last move in the sequence is a threat
that can easily be parried. The evaluation will swing in favour of the
player of the last move. Since the parry is never examined, the result of
the first move in the sequence is aversely affected, but for the wrong
reason.

An accident is what I call when a program plays (against itself or other),
and suddenly disaster strikes. Some group gets killed or cut off, some
territory destroyed by an unexpected sequence. 16 moves is long enough to
shift play to an unrelated part of the board and have an accident happen.
Again, this will aversely affect the valuation of the first move in the
sequence for the wrong reason.

'Urgent moves over big moves' is an important concept in human play. And I
think all Go software have heuristics to ensure urgent moves are chosen over
big moves. They don't rely simply on the evaluated score to decide which
move is better. What I understand from how SlugGo works, is that it does a
simple evaluation of the score to decide which sequence gives the best
result, thereby destroying a large part of the Go knowledge represented by
the heuristics used.

By consistent play I mean finishing a sequence, fight, or joseki before
moving elsewhere. Again I think most, if not all, Go software has extensive
code to prevent the program to abandon a fight or joseki half-way and start
playing elsewhere.

These are all reasons (and I could probably think of more) why David's
approach might not give very good results. Yet it obviously does. So I have
to admit there's something here I don't understand. Can David maybe explain
on what grounds he thinks it plays better for the right reasons?

Since it's such a simple approach I decided to try a simple experiment
myself. I set up Goliath such that, similarly to SlugGo, it would generate 8
candidate moves. Each of these are then extended 16 ply deep (Basically by
having an instance of Goliath play against itself for 16 moves). It would
pick the move where the score that Goliath gave at the end of the 16 move
sequence was the highest. This only took me a couple of hours to do, the
main work coming from the fact that Goliath doesn't give a list of moves
with their values. It often gives only one move. So I had to have it
generate a move 8 times and add code to prevent the same move to be
generated repeatedly. Obviously this gives a program that is 128 times
slower than the original. Fortunately Goliath stems from a time when
computers were much slower than today, so it still plays at an acceptable
speed even in Java.

So far I have only had time to try it out a few full games. My first
impression is though that it plays very well in the beginning of the game,
but plays much worse later on. I don't think the overall result is an
improvement. I had to spend some time making tools that would make visible
what's going on so that I could study the generated sequences. This took
considerably more time than the few hours I spent on putting together the
experiment. When examining the generated sequences more closely I could
conclude that:

- The horizon effect happens frequently and is the main cause for the bad
moves.
- The 'accidents' happen much less frequently, but still occasionally result
in a terrible choice of move.
- The program is generally still fairly good at choosing urgent moves over
big ones.
- There definitely is a lack in consistency of play, but so far has not had
as bad an effect as I would have thought.

On the positive side, there are quite a few cases where the program plays a
much better move than the original program. I have included a game played
against Go++ as an example. Goliath plays Black. Until about move 60,
Black's play is excellent, much better than Goliath would play normally and
it has built a large lead already. But after that it starts to make
occasional big mistakes. Mistakes the original Goliath would never have
made, and these were all caused either by the horizon effect or by an
'accident' elsewhere in the sequences. The end-result is disastrous. Other
games I tried so far show a similar development.

It's still early to draw hard conclusions about this experiment. And after
all, the few hours I've spent to put this experiment together in no way
compares to the effort David has put into his SluGo project. But I'm still
inclined to thinking a couple of things:

- The horizon effect does seem to be a big problem. How come this is not a
problem in SluGo?
- Some of the problems I saw concerning urgent moves appear to be largely
unfounded.
- It often plays much better moves than the original, so there does seem to
be potential to the approach.

Another thing I was wondering is if GNU Go is somehow much more suitable to
an approach like this than Goliath. This doesn't take anything away from
SlugGo of course, if it works, it works. Academically I could see this as a
problem though. Did the success of SlugGo depend on the choice of Go engine?
Would the result have been different if another Go engine had been available
and it had been chosen instead? Or does SlugGo have some solution(s)
implemented that address the horizon effect? Or maybe Goliath is just too
weak or peculiar and unreliable for this kind of approach? After all it's a
very old program, badly written, and I often think myself it's a miracle it
plays as well as it does.

It seems my little experiment raises at least as many questions as it
answers. I can see now that David Doshay's approach has considerable
promise, but at least for it to work for Goliath I would have to address the
problem of horizon effects. That would probably take more time than I have
currently spare. But when I do get more spare time I may delve into it a bit
more.

    Mark

Attachment: Goliath-GoPlusPlus.go
Description: Binary data

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/