Indian tech skills to the rescue of Wikipedia's diverse tongues
Three out of six tech specialists of the powerful global Wikipedia site -- the sixth-largest website in the world and widely used by most computer users -- are from India. Two are from Europe and one from Israel
India might seem like the Tower of Babel with its many diverse tongues but modern-day skills of young techies here are helping solve linguistic problems in computing from across the globe.
Three out of six tech specialists of the powerful global Wikipedia site -- the sixth-largest website in the world and widely used by most computer users -- are from India. Two are from Europe and one from Israel.
"We develop tools, ensure compliance with Web standards and deal with user feedback," Runa Bhattacharjee, Manager of Wikipedia's global Language Team, told IANS here on the sidelines of the ongoing Wikiconference India 2016.
According to her, working on the right-to-left written scripts can be the most challenging. "That includes Hebrew, Urdu, Arabic, Persian and some versions of Sindhi," Bhattacharjee said, adding that Mongolian is actually written from top-to-down but that's not the way it gets used on the Wikipedia.
Khmer, the language of Cambodia, proved to be tough because it had something like 76 characters while there were challenges with languages like Dzongkha used in Bhutan. The team also mentions challenges with languages like Dhivehi (from the Maldivies) and Javanese.
Kartik Mistry, software engineer with the Language Team working out of Mumbai, was involved with Free Software and Open Source technologies almost a decade ago when he was just out of school.
"Most of the problems we faced 10 years ago in rendering Indian language (and non-English) computing are today mostly solved (due to the progress made by various computing solutions)," Mistry said.
"We have launched web fonts for almost all languages. We also added input methods on the (different language) Wikipedias too," he added.
Their goal is to find ways how diverse scripts and languages can get into cyberspace -- including ones like Ol'Chiki script or the Santhali alphabet, an eastern Indian tribal tongue -- whether they have their own Wikipedias or not currently.
The third Indian member of the team is Santhosh Thottingal, Senior Software Engineer (International). Wikipedia also has a number of other Indian names, particularly propping its tech space -- Niharika Kohli (in community tech), Prateek Saxena (multimedia), Subbu Sastry and Kunal Mehta (parsing), Nirzar Pangarkar (reading design), Madhumita Viswanathan (analytics) and Yuvaraj Pandian (labs), among others.
Meanwhile, the Language Team introduced their fairly new Translation Tool at the Wikiconference India 2016. It makes it rather easy to copy and translate pages from one language Wikipedia to another even if one is not perfect in both languages, thus reusing useful knowledge from this sharable encyclopedia and also building valuable local content in languages and dialects people value.
One Mumbai hotelier tried his hand at translating Hindi pages into Bhojpuri while others present appreciated the Content Translation tool and highlighted how easy and useful it was, plus its ability to help people translate even if they felt they did not have the skills to do so.