"The SLSI 1134 is incorrect & incomplete and it should be corrected immediately.", claims Mr Donald Gaminitillake, who is trying to ignite a campaign against Sinhala Unicode standard through www.akuru.org (history of the site), and frequent newspaper articles.
We of the Sinhala GNU/Linux project think otherwise. And we are not alone. Language Technology Research Center of the University of Colombo School of Computing, research groups from the University of Moratuwa and Arthur C Clarke Center for Modern Technology, Microsoft, Microimage, Science Land also think that the standard is correct. A full list would be quite long.
GNU/Linux was the first platform to implement Sinhala Unicode rendering. We dind't find any issues about encoding or displaying those characters Mr Donald claims are impossible - yansaya, rakaaransaya, reepaya, joint letters and all that. Then Microsoft also released a "Sinhala Enabling Kit for Windows". Most vendors today support Sinhala Unicode. None of them, who actually got their hands dirty by writing actual code to implement the standard, see any missing "letters" in the standard.
Implemenation is proof for most poeple. But for some not-so-obvious "reason" Mr Donald continues to say that certain characters are missing!
Our first encounter with Mr Donald hapenned when I wrote an open letter to him which became a lengthy debate (more, more and a seperate archive) on our project mailing list. Harshula, our standards expert, tried to explain to Mr Donald how the basic Unicode code-page and cartesian products of various "sets" create the complete Sinhala character set. However Mr Donald never tried to cooperate with us in "understanding" it, and the discussion led to nowhere.
However, Mr Donald selectively quoted some parts of the discussion on his site... ;-)
Recently, Niranjan Meegammana, creator of kaputa.com, started a Google Group to communicate in Sinhala - using Unicode. This group has now grown to a very interesting community of a unique, intellectual and polite culture. Although the group members use diverse technologies to write and read Sinhala Unicode, we find the standard quite functional and interoperable. And we use yansaya, rakaaransaya and other "special characters" every day.
Most of us on this group have a great passion for language and literature, and therefore the discussions are very interesting and intellectually rich.
This Google group was meant to communicate in Sinhala Unicode to popularize it, and to act as a test bed for implementations. Mr Donald recently joined the group, not to communicate in Sinhala Unicode, but to start another debate. He continues to repeat the same old story and conveniently ignores some of our questions.
Here is a couple of Mr Donald's claims and what I think of them.Donald G: SLS1134 doesn't contain all the Sinhala characters
Wrong. Here is why:
Most of Western languages contain simple alphabets. Even with the upper and lower case variants, and some "odd" characters with bubbles and hats, the number of character don't exceed 50-100.
However, Asian languages are different. Most characters have different "forms", either phonetically (e.g.: Sinhala, Tamil and Hindi), or by the location of the word (e.g.: Arabic). Therefore, it's impractical to allocate characters for each variant.
Think of atoms and molecules. There is a limited number of atoms, and molecule names can be formed by putting together the names of atoms. I have never heard of a "Chemists' Revolution" demanding a symbol for each molecule....;-)
Unicode is very similar to chemistry in that sense. Each language is assigned a "code page", typically containing 128 "code points". They form the basis to build more complex character variants, i.e., actual characters seen by the eye, sometimes referred to as "glyphs".
In Western languages, "characters" and "glyphs" and "code points" are the same thing: because they don't need variants. For example, english character, or glyph, "A" maps to code point 65 - one to one.
For complex languages, only basic characters are represented by code points. Variants are produced by sequences of code points. For example, character "da" (as in "dambana") is directly mapped to code point 0DAF, whereas "du" (as in "dumriya"), which is a variant of "da", is produced by the sequence of two code points 0DAF ("da") and 0DDF ("papilla"). More complex characters (glyphs) are formed by longer sequences.
Most modern operating systems have rendering engines that can display proper glyphs from these sequences of code points (e.g.: Pango, QT, ICU on GNU/Linux, Uniscribe on Windows). Therefore, each glyph not having an individual code point is not a problem.
In Unicode, some characters are directly mapped to code points, while others are produced by sequences of two or more code points.
Deciding which characters should be basic code points, and which characters should be produced by combining code points is a different question, and is obviously dependant on the language, and likely to be subjective. Input from several Sinhala scholars and experts have been taken into account to decide that repaya, rakaaransaya and yansaya should not be basic code points, but should be produced by using sequences of code points, as they are linguistically alternatives forms. In other words, they are there as sequences of code points, not as single code points. Nevertheless, they are there, so the claim is wrong.
If Mr Donald's claim is "yansaya, rakaaransaya and reepaya should be individual code points", that would be more valid. However, somebody has to eventually decide what's basic and what's not, and it has already been done. Technically, this is not an issue at all.
Donald G: Unicode can't produce a matrix of 1600+ characters needed by OCR etc
Wrong. Here is why:
I am not an expert on OCR, but if Mr Donald claims that OCR requires a matrix of 1600+ characters, that's exactly what Sinhala Unicode is. Only it doesn't list all the 1600+ characters, but defines the basic code points (not characters) and the way to generate all the other characters by using sequences of them.
Even a primary school kid can understand something like this: "ka and paapilla produces ku, and this rule applies to all the consonents." It would rediculous if the document describing the standard includes a 1600+ table listing each variant (ka + papilla = ku, kha + papilla = khu...la+paapilla=lu and so on)... ;-)
Showing the basic code points and claiming "not all the characters are here" for the first time is fine. Second time is still fine, IMHO. But 100+th time is definitely a joke... ;-)
Donald G: SLS 1134 doesn't consider Tamil
There is no need.
Character representation in SLS 1134 almost identical (if not identical) to Sinhala subset in Unicode. As the only country that has a major Sinhala speaking population, it's SLSI's responsibility to contribute to Sinhala in Unicode, and SLSI does this through SLS 1134. Developers eventually use Unicode. To my knowledge, none of the FOSS packages found in a typical GNU/Linux system refer to SLS 1134. In other words, SLS 1134 more of an intermitent standard.
India has a much bigger Tamil speaking population, and the Unicode code page for Tamil has already been worked out. Therefore, there is absolutely no need to create a seperate standard for Sri Lanka. Sinhala Unicode is not a Sri Lankan standard either.
Donald G: Sinhala Unicode doesn't have yansaya on the keyboard
Wrong. here is why:
Unicode is about representing characters. How they are typed using the keyboard is completely upto the keyboard driver. There are different keyboard drivers, some are classic Wijesekara, some modified Wijesekara, and some are transliterated (somewhat "singlish"). Some driver authors include yansaya etc on the keyboard itself whereas others provide ZWJ as an alternative to type them.
Whatever the keyboard is, yansaya, rakaaransaya and repaya can be typed, and eventually represented and displayed in the same code point sequences.
There are so many other claims on akuru.org. For example, Mr Donald from time to time challenges that certain words can't be "written" in Sinhala Unicode (latest being the name of the President). When we send him screenshots to show that it's possible (with and without joint characters), he claims that they are fake!!!
There is a saying that it's easy to wake up a sleeping person, but it's very difficult to wake up someone who pretends to sleep.
Mr Donald has applied for a patent for his "system". Although he doesn't seem to have implemented it, he has promissed to deliver results if given an "opportunity" (as far as I know, nobody is holding him). And as Sinhala Unicode is becoming mainstream, his "pending" patent is going to be worthless, unless... oh, well!
Update: 2006-03-21 08:00
There are "valid" articles on akuru.org. Some are about the history of characters, and some are good articles by others authors. For example, articles written by Mr Aelien Silva, one of my favourite writers and linguists who has created so many good Sinhala technical words (e.g.: "manu", "thekala"), brings out very good points about technology localization. In fact, I have often quoted Mr Aelien Silva on the Sinhala Unicode list and elsewhere (need to enable Sinhala Unicode to read it, instructions are here for GNU/Linux and here for Windows, not sure how to do it on Mac... :-( ). However, I belive that hosting such articles is just an attempt to make akuru.org more authentic, which would otherwise be totally useless.