I think it is not easy to implement this function efficiently in hardware.
It is trivial if one accepts - as worst case - N cycles (where N is the
length of the chain). But if one wants something like log(N) or even 1 cycle
it gets difficult. In my paper designs I have not found a reasonable
implementation for log(N). The problem is that two stones of a chain can
share the same liberty. How does one avoid dubble-counting when summing up
the liberties in parallel? If one counts sequential like in software one can
mark the liberties. This is the N-cycles method. But that would be no big
improvement. If one makes a hardware Go-programm one has to solve this
problem. I think its solvable. One has just to have the right idea.
I know of some papers of finding connected components in picture-processing
in log(N) time. This is the same problem. But the algorithms are very
complicated. I can not imagine how to implement this directly in hardware.
Simply finding the connected components in Go can be done very fast
asymptotically (between O(log n) and O(1)),