Polyglot processing

From HimalSouthAsian:

Languages_600_370The internet as we know it today is largely an American phenomenon. Our daily online needs are served almost exclusively by US internet giants based in the Silicon Valley: Google, Facebook, Twitter, Youtube, Dropbox, Amazon, Ebay and more. As a result, the internet’s design and evolution has been shaped by Western democratic values. We’d likely not have the internet in its relatively unstructured and decentralised current form had it come out of Soviet Russia. But with those values also came the language – English. The American Standards Association’s original ASCII code, the dominant encoding scheme of the web until a few years ago, uses only 128 characters to represent all textual information necessary for a computer, to the exclusion of characters alien to English. A German equivalent, if Germans had got the lead, for instance, would certainly have accommodated accented characters. Still, regardless of which Western culture computing advances might have come from, for Southasia and other regions with non-Latin alphabets computing would still have had to be done in a foreign language and alphabet, or in unintuitive versions of their own languages. The UTF-8 (more commonly known as Unicode), popularised in the last decade, has transcended the limitations of ASCII to represent thousands of characters with a single encoding scheme. This has made it possible to represent many different writing systems using one encoding scheme, instead of having to use separate ones for each. Today, this is the most popular standard of character encoding on the web.
Computer users from Southasia will remember the pains of typing and reading their native languages on computers until a few years ago. Today, the smoothness with which one can communicate in regional languages is remarkable, even though the transition to Unicode is not yet complete. In Nepal, for example, several government organisations and news outlets still use old standards, but online discourse that used to be dominated by Romanised transliterations has been replaced by streams of conversation in the original alphabet, accompanied by almost an abhorrence of Romanised variants.
More here.