Viswa Kumar
  • Home
  • About
    • My Background
    • My reading list
  • Research
    • Research Overview
    • Publications
  • Writing
    • Blog
    • Newsletter
    • Papershelf
    • Book Notes
  • Projects

On this page

  • Basics of go-தமிழ்
  • Architecture of go-தமிழ்
    • Algo
    • Deployment

go-தமிழ்

golang
programming
projects
Substack
Throwbacks
Tamil transliteration tool in golang for fun & learning
Published

April 22, 2023

Throwback Posts ⭐️

Whenever you see a post with this callout, it means I had ported some of my old (but still close to heart) post from my old/legacy blog host to here. Enjoy some good old classics ☕️

Having got my feet wet with golang by implementing lsgo equivalent command in golang, It’s now time to explore some depths.

In this post, I’m gonna talk about my new project, go-தமிழ் - Tamil transliteration tool in golang for fun & learning

தமிழ் (Thamizh not Tamil) is my mother tongue. It is absolutely fantastic to see thamizh letters in internet. Due to recent developments in typographic / indic technologies, it is now very easy to type & view in native languages.

At a very basic level, native languages are represented as Unicode, UTF-8, UTF-16, UTF-32 special characters. This way, computers can make sense of every char of every possible language as just an integer.

Although handling UTF-8 strings is defnitely a pain, golang seems to support this out of the box. Especially their unicode/utf8 package is worth a read.

Having known that golang can support தமிழ் natively & having learnt the basics of golang, why not develop a english -> தமிழ் transileration tool ??

Basics of go-தமிழ்

தமிழ் can be largely categorized as உயிர் ( Primary), மெய் (Secondary) & உயிர்மெய் (Vowels).

For example தமிழ் letter க is derived from க் ( which is மெய்) and அ ( which is உயிர்).
i.e க் + அ = க. Similalry மி = ம் + இ.

However in unicode world, the vowels appear as special character. They appear in ், ா, ி form only. So in unicode world, inorder to get மி, we should concatinate ம & ி i.e மி = ம + ி.

So it turns out that, generating tamil characters is quite challenging & interesting. Upon receiving a english transileration text say vanakkam, we need first find the pattern of difference between printing a உயிர் & உயிர்மெய்.

For instance, vaa can be interpreted as வஅ or வா. So it is quite clear that, we need a mechanism to identify whether the user wants to pronounce vowel sounds, or they want to get the actual letter here. In-order to solve this problem, I resorted to have my own encoding scheme for go-தமிழ்.

Architecture of go-தமிழ்

Having decided that, I need to come up with my own encoding rules ( heck this is my own new encoding tool for fun! ), I then started to lay out basic grammer for my own tanglish language.

You can take look at the grammar for go-தமிழ் in the help page of the webpage that gets served as part of go-தமிழ் daemon mode.

To give you some glimpse…

உயிர் | Primary
தமிழ் English
அ a
ஆ 2a
இ i
ஈ 2i
உ u
ஊ 2u
எ e
ஏ 2e
ஐ 3i
ஒ o
ஓ 2o

For complete details on go-தமிழ் encoding rules, please this page.

Algo

  • Get the input text and split it based on space delimiter, resulting in slice of input tokens.
  • Now iterate over each token and perceive every letter of input token as in-turn a slice.
  • By using Golang slicing of the slice technique, iterate from 0 to len(token).
    • Match the new slice with either uyir, mei or vowels pattern.
    • If found, then increment both start & end indices.
    • If not, then increment only end and re-slice the slice from start:end pattern.
    • Loop & repeat till exit.

Deployment

After the main logic got working, now it is just a matter of how to present & package the tool. Usablity is the key aspect here.

Next, inorder to spice up the meal, I decided to have 2 modes of operation - Console mode & Daemon mode.

Console mode

Console mode will mimic a go-தமிழ் >> shell, which takes in english input and return தமிழ் text in the terminal out ( if terminal support is there for UTF-8).

Daemon mode

Daemon mode will run a webserver at port 8080 and it will serve transliteration as a service .

For this, I shamelessly 🙈 copied Golang playground CSS and re-used to my theme. I have to say, it perfectly fitted to my design and I’m kinda proud of it :-)

Although this is not a full-fleged webserver, it does the job for this fun excercise. So I’m good with it.

Looking back 🥹

Looking back, I authored the original version back in 2017 and I tweaked a little for this blog. It was a nostalgic moment to look back how I evolved from a curious Gopher 🐣 to where I’m today. Time flies indeed 🦅

Subscribe to Techno Adventure Newsletter

I also publish a newsletter where I share my techo adventures in the intersection of Telecom, AI/ML, SW Engineering and Distributed systems. If you like getting my post delivered directly to your inbox whenever I publish, then consider subscribing to my substack.

I pinky promise 🤙🏻 . I won’t sell your emails!

Subscribe ✉️

 

Copyright 2024, Viswa Kumar