by Hari Balasubramanian
Of the 7097 languages in the world, twenty-three (including the usual suspects: Mandarin, English, Spanish, various forms of Arabic, Hindi, Bengali, Portuguese) are spoken by half of the world's population. Hundreds of languages have only a handful of speakers and are disappearing quickly; one language dies every four months. Some parts of the world (dark green regions in the map) are linguistically far more diverse than others. Papua New Guinea, Cameroon, and India have hundreds of languages while in Japan, Iceland, Norway, and Cuba a single language dominates.
Why are languages distributed this way and why such large variations in diversity? These are hard questions to answer and I won't be dealing with them in this column. So many factors – conquest, empire, globalization, migration, trade necessities, privileged access that comes with adopting a dominant language, religion, administrative convenience, geography, the kind of neighbors one has – have had a role to play in determining the course of language history. Each region has its own story and it would be too hard to get into the details.
I also won't be discussing the merits and demerits of linguistic diversity. Personally, having grown up with five mutually unintelligible Indian languages, I am biased towards diversity – each language encapsulates a unique way of looking at the world and it seems (at least theoretically) that a multiplicity of worldviews is a good thing, worth preserving. But I am sure there are opposing arguments.
Instead, I'll restrict my focus to the following questions. How can the linguistic diversity of a particular region or country be numerically quantified? How do different parts of the world compare? How to account for the fact that languages may be related to one another, that individuals may speak multiple languages?