[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[computer-go] Re: computer-go Digest, Vol 10, Issue 4



From: "Marco Scheurer" <marco@xxxxxxxxxxxxxxxxx>
To: "Frank de Groot" <frank@xxxxxxxxxxxxxxxxx>; "computer-go"
<computer-go@xxxxxxxxxxxxxxxxx>
Sent: Wednesday, June 23, 2004 12:38 PM
Subject: Re: [computer-go] Need for more robust SGF Readers


> The file seems to begin with 3 illegal bytes, which does not look good
> for a text file.

No, those 3 bytes are perfectly legal and they are called the BOM, byte
order mark, used in Unicode files to tell the application on a machine in
which byte order the code points are, as not all hardware has the same
endian-ness of machine words.  When those bytes are present, MS Notepad
automatically switches to a Unicode font, and applications should test for
these bytes to see what the encoding is. Every flavor of Unicode has its own
BOM.

http://lists.w3.org/Archives/Public/public-i18n-geo/2003Nov/0014.html


> Line:2 Col:3 - Error 23: property <AP> expects compose type value
> (value deleted): [ZenHacker Go Software Suite v. 0.0.0.3]


Someone has commented on that already, indeed, a bug! It should not, and
does not, crash any SGF readers however.


>Why CH since CA seems to do the same thing and is defined in your file?

It's a leftover from a file I read in to produce this file.
As a property that is not defined in the SGF standard, it should simply be
ignored.


> FYI, SGFC is the reference implementation for the SGF FF[4] standard.
> It can be found at http://www.red-bean.com/sgf/sgfc/index.html . If
> your file does not pass the SGFC test it can be considered crappy SGF.


That is nonsense.
The only reference standard for a file format is the actual specification,
not some other program.
Especially with extreme files like these, which are 100% legal sgf except
for a tiny glitch in the AP property, it is highly likely that *any*
application chokes or hickups on it, including SGFC. As long as you see no
errors in the formatting of the RT tags, simply stating that "when SGFC
doesn't parse it, it must be crap" is ridiculous.


> removes much more than what you would expect from your file, it results
> in just 5 moves and one comment.

There are 5 moves and 3 comments, when the RT tags are removed. glGo
displays one of those comments, so it behaves like SGFC in that respect.

To sum it up, the authors of the applications I mentioned should really have
a look at their parsers, as it is now, they are not robust enough to handle
very long strings inside properties they should simply ignore, even when all
neccessary chars are properly escaped.

That it is possible to not choke on my file is shown by Strempel's parser
and SGFC.




_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/