Tuesday, September 27, 2005

Standards, which standards?

The computer Go world suffers from a lack of standards, or, to be more exact, enforced standards.

Take player's names, for example. Jan van der Steen has invented a marvellous method of uniquely identifying players. He assigns numbers to them, the socalled PID or "player ID".

I would love to know what his mechanism for assigning them is. I would love to have a list of those PID's. Why? There are so many alternative methods to transliterate Chinese, Japanese and Korean names into English. Hypothetical case: When, in Gobase for example, someone is called
"Fu Chien" but in GoGoD "Fu Kien", Moyo Go will only be able to find "Fu Kien" and not "Fu Chien".

In such cases it would be a solution to know the PID, and be able to search on it. Unfortunately, Jan van der Steen sees me as a competitor and he declined to share his PID system with me.

There are more of these problems. Similarly as we are currently unable to uniquely identify players because we somehow don't seem to get along, we also are unable to uniquely identify games. There is the "Dyer Signature" - Moyo Go implements it - but Dyer Signatures are neither universally supported, nor suitable to distinguish between games with minor endgame variations.

Then there is the issue of identifying good and bad moves. Take the marvellous Kogo Joseki Dictionary. Gary Odom has done a terriffic job and permitted me to include it with Moyo Go. Yet I found something to bitch about :)

Namely, the way bad moves are indicated leaves something to be desired: They are indicated simply by a
"bad move" comment, instead of the standard SGF property "BM". That's not really Gary's fault, we SGF Editor publishers should make it easier for the user to click a "bad move" button! For computer Go scientists, it is wonderful to have a standardized way of indicating bad moves (especially with omitted comments), so that automatic learning modules know they are bad.

Another point is Unicode. Some SGF contains Unicode without explicitly indicating this. Almost all SGF with Unicode that does indicate this with the proper SGF property does not contain a Byte Order Mark, so that ordinary editors still don't know that the file contains Unicode.

Etcetera, etcetera. Write an SGF parser, and you'll see what I mean. There is so much malformed, downright illegal SGF out there. Not to mention abuse of certain SGF properties, like adding a circle marker after each move (KGS), instead of leaving it to the user, to have the SGF reader show something on the last move or not.

In the meantime, the SGF standard has not really been brought into the 21st century. Where are the properties that cater to multimedia? How do I encode Rich Text, Sound and movies from my webcam? It's left to the individual programmers to invent new properties and encoding standards, if they want to support stuff like that.

I did that, I made up a new SGF property that supports Rich Text, including images and tables.
So far so good. No rocket science. Just take some Rich Text (*.rtf), compress it and then BASE64-encode it because some SGF reader makers complained that their code can't handle bytes that have their MSB set. Fine. So I followed all the rules for well-formed SGF to the letter and put some nice pictures of Go Seigen in my newly defined SGF property. What happened when I tried to load it into
[censored] and [censored]? Not just a crash, much worse, a complete lockup of the computer! The programmers never anticipated long SGF properties. (And of course I provided a compatible SGF property as well with the plain-text equivalent).

I informed the perps and hopefully, the newer versions of
[censored] and [censored] do support SGF properies of arbitrary length. Since I introduced Rich Text in Moyo Go and sold 100 copies, I have not heard of a single person who has actually used it (the feature is disabled by default, that might have something to do with it). But if any other programmer is interested, the spec of how I encode RTF into BASE64 is open, send me an email and I'll be more than happy to share it with you.