Why Native Language is Important for Web

After reading this blog post about blogging Sinhala, I felt like writing my thoughts about the topic.

There are very few "yes" or "no" answers in life, so I don't think it is correct to rule "blogging in Sinhala is a good idea" or vice versa. Most answers can begin with an "it depends", and I think it is true here, too.

In certain circumstances, using in English on the Web is a good idea. When addressing a global audience, or selling to a product on the Web to the global market, not using English will definitely not serve the purpose.

A key argument for using Sinhala is about addressing certain audiences who are not fluent in other languages.

I think there is a more important reason. Certain things can only be done in Sinhala, and this argument holds for any other language.

A blog post is not always a piece of information to be transmitted to a maximum audience. Sometimes it is a work of art. Works of art are diverse, and this diversity is not only limited to language.

Sinhala is not only a communication medium. It also has a very rich literature: poetry, writings and what not. Being a living language, new Sinhala literature is made every day. And if Web is the medium for such literature, obviously, Sinhala has to be the language.

Check out this blog post for example. (You may need to enable Unicode support). It is a collection of Sinhala poetry from an online "hitiwana kavi maduwa", where people used poetry to communicate. I am sure there are lots of readers who appreciate such work. I can hardly imagine how such a blog post can be in English.

So I think the answer to most questions of life applies here as well: it depends. ;-)



LaTeX and Sinhala Unicode

When we met at Excel World on last 17th, Bud, Srimal and myself started talking about using Sinhala Unicode in TeX / LaTeX.

It didn't occur to me that Chamath, who also created one of the first Sinhala FOSS keyboard drivers, has already created a preprocessor for LaTeX called sintex which reads Sinhala files in Unicode/UTF-8. In fact, not only had I replied to his announcement, but also sent a patch to Debianize it! Life is too complex, and I am too human to keep track of all these.

But that forgetfulness turned out to be a lucky incident, as our pursuit lead to something more useful!

So we started creating a preprocessor for Vasantha Saparamadu's Sinhala TeX package which uses Samanala transliteration scheme.

However, Bud pointed out that the generated PDF files will have ASCII characters instead of Unicode, making it a problem for search engines that index them, and convert them for "HTML view" pages.

After some research, we found XeTeX, a Unicode enabled version of LaTeX.

XeTeX uses ICU for text layout, and ICU versions after 3.6 supports Sinhala out of the box. However, latest stable version 0.996 of XeTeX uses statically linked ICU 3.4. I managed to patch the "tetex-xetex" package that comes with Debian and make it recognize Sinhala. The patches were also submitted to Debian.

XeTeX font changes are always manual, which made the source look ugly. After a bit of research, I found zhspacing package, which among other things automatically sets fonts for Chinese characters. But it is a complicated package, but I managed to get an idea of how it uses character class feature in the latest XeTeX version 0.997.

Downloading the latest version of XeTeX from SVN repository and building for Debian was not difficult, except I had to edit debian/control files to replace tetex-base and tetex-bin dependency to their texlive counterparts. I had to first get xdvipdfmx. Here is a rough sketch of the work.

% mkdir xdvipdfmx
% cd xdvipdfmx
% svn co http://scripts.sil.org/svn-view/xdvipdfmx/TRUNK
% cd TRUNK
% chmod +x debian/rules
# dpkg-buildpackage -b
# cd ..
# dpkg --purge dvipdfmx
# dpkg -i xdvipdfmx...deb
% cd ..

% mkdir xetex
% cd xetex
% svn co http://scripts.sil.org/svn-view/xetex/TRUNK
% cd TRUNK
% vi debian/control
% chmod +x debian/rules
# dpkg-buildpackage -b
# cd ..
# dpkg --purge texlive-xetex
# dpkg -i xetex...deb

As the XeTeX web site had warned, the Debian build files provided by vanilla XeTeX were not up to date. After installing I had to create a /etc/texmf/fmt.d/10local.cnf with the following two lines:

xetex   xetex  -             *xetex.ini
xelatex xetex  language.dat  *xelatex.ini

and then run the following commands:

# update-fmutil
# fmutil-sys --enablefmt xetex
# fmutil-sys --enablefmt xelatex

to make "xelatex" command to work properly.

After getting latest version of XeTeX working, the last remaining step was to create a small style file, which I called "sinhala.sty", to make automatic font switching for Sinhala.

% sinhala.sty version 20080420
% Typesetting mixed Sinhala documents in XeTeX
% Copyright (C) 2008 by Anuradha Ratnaweera
  \errmessage{XeTeX is required to use sinhala}
  \errmessage{XeTeX 0.997 or above required to use sinhala}
\XeTeXinterchartokenstate = 1
  \XeTeXcharclass\cnt=10 \ifnum\cnt<"0DFF \advance\cnt1
\XeTeXcharclass "200C = 10
\XeTeXcharclass "200D = 10
\XeTeXinterchartoks 0 10 = {\sifont}
\XeTeXinterchartoks 255 10 = {\sifont}
\XeTeXinterchartoks 10 0 = {\latinfont}
\XeTeXinterchartoks 10 255 = {\latinfont}

So, all you need is XeTeX 0.997 and sinhala.sty to write LaTeX files using Sinhala Unicode.



Goodbye xorg.conf!

After reading this article on xrandr, I wanted to see how total autoconfiguration works on X Windows.

As a start, I tried removing xorg.conf file completely and restart X. The sky didn't fall down! In fact, I didn't notice any change. Everything from USB hotplug to OpenGL continued to work as before.

The only tweak needed was to the old font system. Unlike fontconfig, the old X font system seems to depend on the "font path" set in xorg.conf. This was a problem for using my custom SUN22x12 font in xterm. After adding the following lines to ~/.fvwm/preferences/Startup, this problem was gone, too.

AddToFunc InitFunction
+ I Exec exec /usr/bin/xset +fp /usr/local/share/fonts

Yes, I use fvwm-crystal, a "polished" version of FVWM. Old school, so what?



Simplifying Digital Camera Access on GNU/Linux

Digital camera access is simple enough on GNU/Linux, but with a couple of tweaks here and there, it can be made even simpler.

Summary: I keep my photos in directories named "yyyy-mm-dd" by the date taken. When I plug in the camera, photos are automatically downloaded and sent to correct directories. If you like to know how I did it, please read on!

Accessing digital cameras has always been simple on GNU/Linux. A large number of digital cameras are supported out of the box. When using the shell, arguably gphoto is the most convenient. Running gphoto with the "P" option autodetects the camera and downloads all the photos in it.

% gphoto2 -P

I keep all my photos in a "photos" directory with subdirectories in the "yyyy-dd-mm" format indicating the date taken.

First step of simplification is to automatically put each image into the correct location. Digital cameras put a lot of Exif information into each image, so extracting the date taken is quite straightforward. I use a simple tool called exif to do this.

Both gphoto2 and exif are available on Debian.

# apt-get install gphoto2 exif

After some trial and error, I figured that the images taken with my Canon PowerShot S3 IS have a tag 0x132 indicating the date each photo was taken.

% exif -t 0x132 IMG_0416.JPG 
EXIF entry 'Date and Time' (0x132, 'Date and Time')...
Tag: 0x132 ('DateTime')
  Format: 2 ('Ascii')
  Components: 20
  Size: 20
  Value: 2007:10:22 06:05:49

What we want is in the "Value:" line. After filtering that line with grep, and using sed a couple of times, we can get the date in yyyy-dd-mm format.

% exif -t 0x132 IMG_0416.JPG | \
    grep 'Value: ' | \                           # Filter the line with "Value:"
    sed 's/.*Value: \(....:..:..\) .*/\1/' | \   # Get the yyyy:mm:dd part of the value line
    sed 's/:/-/g'                                # convert ":" to "-"

If you want to understand exactly what each step is doing, try the above pipeline by adding one filter at a time.

Then I put together a small script to move each image in the current directory to ~/pictures/yyyy-mm-dd/ subdirectories where I want them.

gphoto2 -P

for i in *.JPG
    date=$(exif -t 0x132 $i | \
        grep 'Value: ' | \
        sed 's/.*Value: \(....:..:..\) .*/\1/' | \
        sed 's/:/-/g')
    mkdir -p "$dir"
    mv -f "$i" "$dir"

Notice that I use a test directory. I saved this in ~/bin/, and made it executable.

Now comes the fun part. After connecting the camera to the computer, I used "lsusb" to find out its vendor ID and product ID are 04a9:311a. The following udev rule in /etc/udev/rules.d/010_local.rules invokes the above script whenever this camera is plugged in.

ACTION=="add", BUS=="usb", \
    SYSFS{idVendor}=="04a9", SYSFS{idProduct}=="311a", \

Well, matters are a little more complicated. Udev seems to invoke the script multiple times. So I added two extra "features" to stop that.

  • Adding a lock file to prevent multiple simultaneous running of the script.
  • Use a "timestamp" file at the end of the script, and not run again "too soon" (60 seconds turned out to be ok).

These made sure that the script is run only once when the camera is plugged in.

I used the number of seconds since the Unix Epoch given by the stat and date commands. If the timestamp file was created less than 60 seconds ago, the script aborts.

So here is the complete script:


set -e

log=$(date +"$logdir/%Y-%m-%d");


# Avoid multiple simultaneous runs
ln -s $lock $lock || exit 0

# Abort if we had run less than $cooldown seconds ago
if [ -f "$lasttime" ]
    t1=$(stat -c '%Z' $lasttime)
    t2=$(date +'%s');
    dt=$((t2 - t1))
    if [ $dt -lt $cooldown ]
        rm -f $lock
        exit 0

# Take it slowly ;-)
sleep 3

mkdir -p $download
mkdir -p $logdir
rm -f $download/*

# Get the photos, all of them
cd $download
gphoto2 -P

for i in *.JPG
    date=$(exif -t 0x132 $i | \
        grep 'Value: ' | \
        sed 's/.*Value: \(....:..:..\) .*/\1/' | \
        sed 's/:/-/g')
    if [ ! -f "$dir/$i" ]
        [ -d $dir ] || mkdir -p $dir
        chown $user:$group $i
        chmod 644 $i
        chown $user:$group $dir
        mv -f $i $dir
        echo "$date/$i" >> $log

rmdir $download

# Add a timestamp
touch $lasttime

rm -f $lock
exit 0



Network Traffic Accounting

After totally automating my Mobitel 3G connection, the next natural step was to setup some kind of a traffic accounting system. I wanted to avoid tools that monitor individual packets, because that was an unnecessary overhead. vnStat turned out to be a perfect match.

Here are the steps in setting up vnstat on Debian. Good news is that vnstat in Debian comes with proper crontab entries and network up/down hooks already in place.

  • First step, obviously is to install vnstat:
    # apt-get install vnstat
  • Create a new configuration:
    # vnstat --showconfig > /etc/vnstat.conf
  • Edit /etc/vnstat.conf and set the default interface to "ppp0".
  • Create an empty database for ppp0:
    # vnstat -u -i ppp0

Now vnstat starts counting network traffic. The default crontab seems to run "vnstat -u" every 5 minutes.

Then I installed this simple web based frontend called vnStat PHP frontend. Installation is just a matter of unpacking:

# cd /usr/local/src/
# wget http://www.sqweek.com/sqweek/files/vnstat_php_frontend-1.3.tar.gz
# cd /var/www
# tar -xzvf /usr/local/src/vnstat_php_frontend-1.3.tar.gz
# mv vnstat_php_frontend-1.3 vnstat

Then I had to edit /var/www/vnstat/config.php and set the following values.

$iface_list          = array('ppp0');
$iface_title['ppp0'] = 'Mobitel 3G';
$vnstat_bin          = '/usr/bin/vnstat';

Pointing a browser to http://localhost/vmstat/ showed that everything is working fine.

I also have the following .htaccess file in vnstat directory to avoid access from remote hosts:

Order deny,allow
Deny from all
Allow from ::1/128


Mobitel 3G with Huawei E220 on Debian

Finally I decided to shift my mobile Internet connectivity from GPRS to HSDPA by getting a Mobitel 3G broadband connection.

The package includes a Huawei E220 HSDPA USB modem (Wikipedia article), SIM card and connection, connection fee and first month's bill waived, a Rs 5k deposit which shall be refunded after an year.

Huawai E220 is known to work out of the box on Linux after 2.6.20. However, some of the most recent kernels seems to have a conflict with the USB mass storage driver. It means, the disk drive in the modem with Windows drivers get detected, but not the modem. I am presently running Linux, which also exhibits this behavior.

Bud suggested a quick workaround: to start the computer while the USB dongle is plugged in. This worked, and the modem was autodetected as /dev/ttyUSB0.

The sales person at Excel World Mobitel outlet told me that the APN has to be statically set to "mobitel3g" and the number is "*99***1#". It was easy to find the AT commands to do this. I created the following /etc/wvdial.conf file and running "wvdial" afterwards took me to the Internet. Username and password was just to stop wvdial complaining.

[Dialer Defaults]
Modem = /dev/ttyUSB0
Baud = 1843200
Modem Type = Analog Modem
Init2 = ATZ
Init3 = ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0
Init4 = AT+CGDCONT=1,"IP","mobitel3g"
Dial Command = ATDT
Phone = *99***1#
Username = foo
Password = bar
Stupid Mode = yes

Going with the "get it working, then make it better" philosophy, then I looked at the conflict with the USB disk. There are couple of suggestions to the kernel usb-storage driver discussed on the above thread - which didn't work for me unfortunately, but just running this huaweiAktBbo utility did the trick. So I copied the binary to a standard location.

# apt-get install build-essential libusb-dev   # just to be sure
# cc -o huaweiAktBbo -lusb huaweiAktBbo.c
# cp huaweiAktBbo /usr/local/sbin/

Then I created a small script to initiate the connection:


if ln -s $lock $lock
    [ -c /dev/ttyUSB0 ] || /usr/local/sbin/huaweiAktBbo
    sleep 3
    if [ -c /dev/ttyUSB0 ]
        cp -f /etc/wvdial.conf.mobitel /etc/wvdial
    rm -f $lock

exit 0

Lock can also be created in /var/run, and notice that I have not done a "set -e", because if wvdial stops with an error, we still need to remove the lock. The lock is there to avoid multiple invocations of the script.

Then I created a simple udev rule to automatically connect whenever the E220 is plugged in. You can use "lsusb", among others, to find the vendor ID and product ID. I created a new /etc/udev/rules.d/010_local.rules file with the following line.

ACTION=="add", BUS=="usb", SYSFS{idVendor}=="12d1", \
    SYSFS{idProduct}=="1003", RUN+="/usr/local/sbin/mobitel.sh"

That's it! Now I am on the Internet automatically whenever the device is connected to the computer!

The only remaining "problem" was hal which was still trying to automount the USB disk. As suggested in ArchLinux wiki, creating a file /usr/share/hal/fdi/preprobe/20thirdparty/10-huawei-e220.fdi with the following fixes this:

<?xml version="1.0" encoding="UTF-8"?>
<deviceinfo version="0.2">
   <match key="usb.vendor_id" int="0x12d1">
     <match key="usb.product_id" int="0x1003">
       <merge key="info.ignore" type="bool">true</merge>

The connection is fast and way cheaper than GPRS when it comes to volume. HSDPA costs one rupee per MB, while GPRS costs 20!

Update: I noticed that udev runs the script multiple times, so as a quick fix I have added a test for /dev/ttyUSB0 in a couple of places in the script to make matters better.