For serious academic papers, it is important to make comparisons with
apples to apples. You want to understand the behavior of the parallel
algorithm and you don't want to start with completely different pieces
of code. The papers you read were quite honest in acknowledging this
fact and pointing out that a pure serial version might be written a
little bit differently. On the other hand Vincent claims a 40X
slowdown, and you realize that he is exaggerating. You are being kind
when you say he "exaggerates" in this case because we are talking
about less than a 10% difference, not a 4000% difference.
Without supporting data or a way to reproduce the results such claims
are useless. The paper I referred to was talking about a 60%
improvement. Now it's only 10%? Which number are we supposed to believe?