[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] Re: computer-go Digest, Vol 10, Issue 4



On Jun 24, 2004, at 07:11, Frank de Groot wrote:

The file seems to begin with 3 illegal bytes, which does not look good
for a text file.
No, those 3 bytes are perfectly legal and they are called the BOM, byte
order mark, used in Unicode files to tell the application on a machine in
which byte order the code points are, as not all hardware has the same
endian-ness of machine words. When those bytes are present, MS Notepad
automatically switches to a Unicode font, and applications should test for
these bytes to see what the encoding is. Every flavor of Unicode has its own
BOM.

http://lists.w3.org/Archives/Public/public-i18n-geo/2003Nov/0014.html
http://www.unicode.org/unicode/faq/utf_bom.html says:

    3.  Some byte oriented protocols expect ASCII characters at
        the beginning of a file. If UTF-8 is used with these protocols,
        use of the BOM as encoding form signature should be avoided.

Line:2 Col:3 - Error 23: property <AP> expects compose type value
(value deleted): [ZenHacker Go Software Suite v. 0.0.0.3]
Someone has commented on that already, indeed, a bug! It should not, and
does not, crash any SGF readers however.
Of course not, but before accusing everybody else to write crappy parsers and crappy SGF, some humility could help.

FYI, SGFC is the reference implementation for the SGF FF[4] standard.
It can be found at http://www.red-bean.com/sgf/sgfc/index.html . If
your file does not pass the SGFC test it can be considered crappy SGF.
That is nonsense.
The only reference standard for a file format is the actual specification,
not some other program.
Especially with extreme files like these, which are 100% legal sgf except
for a tiny glitch in the AP property, it is highly likely that *any*
application chokes or hickups on it, including SGFC. As long as you see no
errors in the formatting of the RT tags, simply stating that "when SGFC
doesn't parse it, it must be crap" is ridiculous.
SGFC is available from the official SGF site and says (http://www.red-bean.com/sgf/sgfc/index.html):

SGFC is the reference implementation for the SGF FF[4] standard.
It's a command line tool for checking SGF files for correctness
and correcting any errors. It also converts FF[1]-FF[3] files to FF[4].

In my book a reference implementation means that when in doubt, the implementation is right. That's because specs sometimes lack precision. It can happen that the reference needs to be corrected, but until it's done, it is still the reference.

From http://www.red-bean.com/sgf/ :

This is the official specification of the SGF FF[4] standard. [...]
It's a text only, tree based format.

From http://www.red-bean.com/sgf/sgf4.html :

SGF uses the US ASCII char-set for all its property identifiers and
property values, except SimpleText & Text

Also, a valid file is supposed to start with (; , even when most parsers know how to detect the start of the SGF information.

I think all this and the quote about BOM usage above introduce some ambiguity with respect to the inclusion of a BOM at the beginning of the file, even if it is not a big deal. A BOM may make your file look like a binary file to some applications.

Why CH since CA seems to do the same thing and is defined in your file?
It's a leftover from a file I read in to produce this file.
As a property that is not defined in the SGF standard, it should simply be
ignored.
According to SGFC it is an error, not a warning like RT, because CH was defined in FF3, removed from FF4, and your file advertize itself as FF4. One could argue, maybe like you do, that it should be allowed in FF4 since there is a mechanism to handle unknown properties, but it looks like the authors decided against it on purpose.

removes much more than what you would expect from your file, it results
in just 5 moves and one comment.
There are 5 moves and 3 comments, when the RT tags are removed. glGo
displays one of those comments, so it behaves like SGFC in that respect.

To sum it up, the authors of the applications I mentioned should really have
a look at their parsers, as it is now, they are not robust enough to handle
very long strings inside properties they should simply ignore, even when all
neccessary chars are properly escaped.
Agreed, if that's the problem.

That it is possible to not choke on my file is shown by Strempel's parser
and SGFC.
Good. So you may be happy to know that on Mac OS X, Goban is also able to open your file. And in some ways, you agree that SGFC can be used to validate your files.

marco

Marco Scheurer
Sen:te, Lausanne, Switzerland http://www.sente.ch

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/