[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [computer-go] Cellular automata
> An evaluator most therefore be able to cope
> with these positions. E.g. Does the evaluator adequatly deal with bad shapes
> and/or overconcentration of stones? From the human games he has not learned
> to deal with this configurations.
This is a true and could be part of the reason for the fairly poor
performance of the models at predicting expert moves. The simple models
in the paper are not trained to evaluate positions. They are trained to
predict final territory outcomes from expert games. The evaluation is
generated by scoring the resulting predictions.
These initial models are very simple at it seems unlikely that they have
any 'understanding' of shape etc.
However a simple way in which the models could fall down by being
trained on expert games is as follows: lets assume that experts tend to
place stones in positions where they make territory. This means that
throughout an expert game there will be fewer large swings of fortune
than we might observe in a computer game. When a stone is placed down
it is quite likely it represents some territory (in the chinese sense).
This means that as the models are trained from expert positions they
learn that there is a large bias in favour of the board vertex under a
stone being part of the territory of the owner of this stone. This is
fine if we are only concerned with evaluating expert positions.
However, when we turn to move selection we perform a 1-ply search over
possible moves. Most of these generated positions will be very
un-expert. Therefore, the 'expert' models become over confident about
strategic moves into the opponent's territory which results in far too
agressive play (exactly what we observed).
To summarise: in some sense the models are trained to assume that all
moves are expert moves and therefore *must* be good...
This is something to think about in future work. It might be possible
to train the models to evaluate moves from the expert's move selection
(rather than the territory predictions) which would solve this problem
(to do the training, other possible moves from each position would
necessarily have to be considered as well as the expert's selected
move).
David
_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/