[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] SGF parsers



On Dec 14, 2004, at 4:00 AM, Paul Pogonyshev wrote:

Another thing you are missing is speed. My SGF parser (also I admit
it is _extermely_ complicated) can crunch a 100 M file in some 5--6
seconds. I seriously doubt any XML parser out there can come anywhere
close.
Probably not, all XML parsers I've seen are slow. I wonder if my OS can
*read* 100Mb in 5-6 seconds.
It can, or at least fetch it from disk cache. Actually, when I saw your
reply, I got somewhat sceptical myself as I actually never fead my parser
with 100 MB (only 50 ;) So I generated a random file with branching factor
of 1--4 and only moves and comments.

real 0m5.656s
user 0m4.885s
sys 0m0.520s
[paul@localhost sgf]$

So the file is 104.8 MB, `sgf-test' basically reads the file and discards
all data (i.e. does nothing else but reading) and my box has Athlon XP 2600+
and 512 MB RAM, the program is compiled without optimizations (easier to
debug.) Standard `-O2' chops off a little over 25% runtime. If the file
was not in UTF-8, quite a lot of time would have been spent in iconv()
converting characters.
Sen:te Goban's parser is based on an optimized version of sgf.c by Antti Huima. It takes 36 seconds to read the 20300+ files, ~40MB of a GoGoD distribution, to build the trees in memory and to create game record references that include the number of moves and the game signature. That's on a 1 GHz PowerPC.

I agree with you that using XML or building an intermediate DOM tree seems like a waste of time and memory. Of course, the parser I used came with its own data structure and tree representation, which was not necessarily what I wanted, but wrapping my tree structure around it was not a big deal (but then, I wasn't using Java).

Marco Scheurer
Sen:te, Lausanne, Switzerland http://www.sente.ch

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/