What are the 22 scheduled languages of India?
An informal outline of India's linguistic landscape
One often hears the factoid that India has “hundreds” of languages, depending on how a “language” is defined. Of these, 22 have been given scheduled status by the Constitution of India. Multilingualism is a common part of Indian life, with many people competent in perhaps two, three or four languages, which they switch seamlessly between based on whom they are talking to (friends, coworkers, family, distant relatives, etc.). In the diaspora, it is less common for us to naturally engage with Indian languages other than our own mother tongue, if even that.
So, I thought it would be interesting to list out all of India’s scheduled languages, and reflect briefly on my current impression (or lack thereof) of each. Basically, I want to show what a “mental map” of Indian languages looks like, similar to what we in the West have for European languages (ex. an awareness of the Latin-derived Romance languages in the west/south, Germanic in the north, Slavic in the east, as well as an awareness of the historical prestige of French, German, and English).
To give some geographical context, here is the map that Wikipedia uses for the article “Languages of India.” To be clear, it does not include all 22 scheduled languages, and also shows languages that are not scheduled. Even so, it gives a rough idea of the geographical extent of Indian languages, in case you’re unfamiliar.

Note: For each language, I am including an estimate of its total number of speakers (both L1 and L2), according to Wikipedia. This will sometimes be many more than the number of speakers just in India, especially for Urdu and Punjabi (which are also spoken in Pakistan) and Bengali (which is also spoken in Bangladesh).
Let’s start with the “North Indian” languages:
Hindi/हिन्दी (~610 million) - The flagship language of India, and the 4th or 5th most widely spoken language in the world. Like many languages of India, Hindi is part of the Indo-European language family (specifically the Indo-Aryan branch), meaning that it is related to languages like Persian, Russian, Spanish, and English. This is readily seen in the pronouns maiṅ (I) and tū/tum (you), as well as in the numbers ek (one), do (two), tīn (three), etc. Like Romance languages (Spanish, Italian, French, etc.), Hindi has grammatical gender - all nouns are either masculine or feminine, which affects how modifiers connected to them are declined. Hindi has lots of loanwords from both Sanskrit and Persian, with the Sanskrit lexicon associated with Hindu contexts, and the Persian lexicon with Islamic contexts. Here is an example of highly Sanskritized Hindi from the B.R. Chopra’s Mahabharat TV series (1988-1990). Here is an example of more Persianized, Urdu-like Hindi from the film Mughal-E-Azam (1960). This brings us to the next official language…
Urdu/اردو (~250 million) - Essentially a heavily Persianized register of Hindi, and also the official language of neighboring Pakistan1. The name comes from the Turkic word ordu, the same source for English “horde,” reflecting the language’s association with the army camps, and later royal courts, of India’s Turko-Mongol Muslim rulers. Urdu is written in a Perso-Arabic script, making it visibly distinct from Hindi. Thus, an “Urdu speaker” and a “Hindi speaker” will be able to converse perfectly fine, but won’t be able to read each other’s writing2. Many Bollywood songs use lyrics inspired by Urdu poetry. Thus, many Indians, even if they aren’t Muslim, know a lot of Urdu (and by extension, Persian) vocabulary - perhaps more than they are consciously aware of.
Punjabi/ਪੰਜਾਬਿ (~150 million) - The language of the Punjab region, split between India (where it is written in Indic Gurmukhi script) and Pakistan (where it is written in Shahmukhi, a Perso-Arabic script like Urdu). Punjabi music, especially that associated with bhangra dance, has become highly popular both in Bollywood and in the Indian diaspora3, to the point where it has come to represent “stereotypical” Indian music in some contexts (for example, you might recognize the Punjabi-language songs Mundian To Bach Ke and Tunak Tunak). Punjabi is closely related to Hindi, but has a distinctive accent and is readily distinguished by some different postpositions (like vic instead of maiṅ for “in”), among other things.
Kashmiri/کٲشُر (~7 million) - Despite the region’s prominence in political discourse due to its unfortunate conflict and turmoil, the Kashmiri language itself has a relatively small number of speakers, and, as far as I can tell, relatively little impact on mainstream Indian culture. A popular Kashmiri folk song appears to be Hukus Bukus. Listening to this, I feel that Kashmiri sounds qualitatively distinct from the other Indo-Aryan languages.
Dogri/𑠖𑠵𑠌𑠤𑠮 (~3 million) - A language spoken in Jammu, which is next to Kashmir. However, it’s not closely related to Kashmiri. I don’t know much about it other than that.
Nepali/नेपाली (~32 million) - A language spoken in northern India and (obviously) Nepal. I haven’t really heard it spoken, but from judging from songs, I’d say that it sounds a lot like Hindi. As a heuristic for how similar Nepali is to Hindi, consider the first line of Nepal’s national anthem and see if you can pick out the meaning of each word: sayauñ thuñgā phulkā hāmi euṭai māla nepāli (“Woven from a hundred flowers, we are one garland of Nepal”).
Now let’s move on to the “western” languages (these are all still Indo-Aryan):
Gujarati/ગુજરાતી (~60 million) - Language of Gujarat state, i.e. the large peninsula that juts out of western India into the Arabian Sea. There are lots of Gujaratis in the Indian diaspora, and I definitely heard it growing up in the U.S. among my Gujarati friends. A quickly noticeable feature of Gujarati is its use of the “to be” verb che, corresponding to the hai of Hindi. Thus, the Hindi greeting kaise ho (how are you?) translates into Gujarati as kem cho. Incidentally, the che verb also makes Gujarati sound superficially like Bengali, because of Bengali’s use of -ch- in conjugations like jāche, giyeche, etc.
Sindhi/सिन्धी (~30 million) - Sindhi is closely related to Gujarati. There is no Sindhi-speaking state in India, with the majority of speakers living in Pakistan (which does have a Sindh state). I don’t think I have ever heard it spoken.
Marathi/मराठी (~100 million) - The language of the state of Maharashtra. I don’t know much about it, other than that it supposedly has had lots of influence from the Dravidian (i.e. South Indian) languages due to its geographic location. The few times I have heard spoken Marathi, I felt that it sounded very Dravidian to my ears.
Konkani/कोंकणी (~2 million) - A small language spoken in Goa and other parts of India’s western coast, that is closely related to Marathi.
Let’s move into South India now. For the most part, the languages spoken in South India are from the Dravidian language family, rather than from Indo-European. There are many theories trying to connect Dravidian with other languages in the world, including ancient Elamite, Uralic languages (ex. Finnish and Hungarian), and Japanese. None of these theories are widely accepted.
Kannada/ಕನ್ನಡ (~80 million) - The language of Karnataka state, whose capital city is Bangalore (Bengaḷūru in Kannada). As a Dravidian language, its base vocabulary is completely unrelated to Indo-European. Take, for example, the pronouns nānu (“I”), nīnu (“you”), and avaru (“he”), or the numerals ondu (“one”), eraḍu (“two”), and mūru (“three”). My favorite thing about Kannada is the way it sounds, especially in its poetic and lyrical forms. Morphemes are appended agglutinatively to form long multi-syllable words, which to me sound like a gentle stream flowing over smooth pebbles, like in the words maragaḷalli (mara + -gaḷu + -alli = “in the trees”) and śilegaḷella (śile + -gaḷu + ella = “all the idols”).
Tamil/தமிழ் (~90 million) - The language of Tamil Nadu state. Tamil and Telugu (which we will get to soon) have the largest film industries in south India, and so I have watched a fair number of Tamil movies. Here is an example of a Tamil song from the movie Jeans (1998)4. Within the Dravidian family, Tamil, Malayalam, and Kannada are part of the southern subgroup, while Telugu is part of the south-central subgroup5. Thus, Tamil is closely related to Kannada, with some regular phonological changes: Tamil p vs. Kannada h, and Tamil v with Kannada b. Consider the Tamil-Kannada cognates puḷi-huḷi (“sour/tamarind”) and viṭu-biḍu (“to leave”). Also, Tamil has lots of words that end in consonants, while this is basically forbidden in written Kannada. Thus, the equivalent of the Kannada pronouns nānu, nīnu, and avaru are nān, nī, and avar in Tamil6. As of yet, I don’t know anything about Tamil verbs and their conjugations to make meaningful comparisons with Kannada.
Malayalam/മലയാളം (~40 million) - The language of Kerala state on the southwestern coast of India. Malayalam is even more close to Tamil than Kannada, and is thought to have diverged from Tamil about a thousand years ago. I’ve watched a handful of Malayalam movies, and can occasionally make out a few words from the subtitles, based on analogy with Kannada. I feel like Malayalam has a distinctive intonation style, almost (bear with me here) like Tamil with a Punjabi accent. I’d have to study the language in more detail to figure out why I get this impression.
Telugu/తెలుగు (~100 million) - The most widely spoken Dravidian language, and the official language of two Indian states: Andhra Pradesh and Telangana7. It is generally considered to form a distinct subgrouping within Dravidian, compared to the subgroup that contains Tamil, Kannada, and Malayalam. Despite this, the script used for written Telugu is almost identical to that of Kannada, which leads many people to say that these two languages are similar in general. To be fair, I think that pronunciation and some grammatical elements between Telugu and Kannada are quite similar. I think this is due to the fact that the literary forms of both languages were cultivated in the court of the Vijayanagara Empire (1300s-1600s), leading to aesthetic similarities, although I have to read more about this. My water-ripple imagery for how lyrical Kannada sounds also holds for Telugu, as can be seen in the beautiful song Sirivennela from the film Shyam Singha Roy (2021).
We now move into eastern India, which takes us back to languages of the Indo-Aryan family:
Bengali/বাংলা (~280 million) - Also known as Bangla, this is the language of the Bengal region, which is partitioned between the Indian state of West Bengal and the country of Bangladesh. I would say that Bengali is to South Asia what French is to Europe. Just as French sounds noticeably distinct from its Romance siblings Spanish and Italian, Bengali phonology immediately sets it apart from Hindi and many other Indian languages. The main distinctive features of Bengali phonology are the replacement of a with o (and thus ai with oi), v with b, and y with j, as well as the collapse of the sibilants s/ś/ṣ into just ś. The interaction of these sound changes lead to some pretty drastic deviations in how Sanskrit-derived words are pronounced: sarasvatī becomes śorośoti, sūrya becomes śurjo, etc. I think that these phonological features are why Indians think of Bengali as a “sweet” language - it kind of sounds like your mouth is always stuffed with rasgullas (or rośgollas, as Bengalis say). The other reason that Bengali is like French is the great prestige that the language had during the 18th, 19th, and early 20th centuries - a period that is sometimes called the Bengali Renaissance8. Briefly, Bengali religious, philosophical and literary developments during this time had a great influence on mainstream Indian culture and thought, which is still evident today.
Maithili/मैथिली (~20 million) - The language of Mithila, a region west of Bengal that straddles Bihar, Jharkand, and southern Nepal. Maithili, along with Magahi, Bhojpuri, and a few others, form the western part of the Māgadhan subfamily of Indo-Aryan (the eastern part is Bengali, Assamese, and Odia). From what I understand, Maithili has a rich literary history, which is probably why it has official status in the Indian consitutiton while the other languages of Bihar do not.
Odia/ଓଡ଼ିଆ (~40 million) - The language of Odisha state, south of Bengal and in the east-central part of India. It sounds quite similar to Bengali, and is in fact closely related.
Assamese/অসমীয়া (~25 million) - The language of Assam state. It is also closely related to Bengali, but is distinguished by further sound changes. People who think that Bengali sounds weird are going to think that Assamese sounds really weird. For example, the Sanskrit word bhāṣā (“language”) is pronounced in Assamese as bhaxa.
The really eastern languages (neither Indo-Aryan nor Dravidian):
Santali/ᱥᱟᱱᱛᱟᱲᱤ (~8 million) - This language is spoken by the Santal people of eastern India across several states. It belongs to the Austroasiatic language family of Southeast Asia, meaning that it is distantly related to Vietnamese. It is thought that Austroasiatic-speaking peoples (specifically from the Munda branch that includes Santali) have been in South Asia since prehistory. From what I’ve read, Santali has been greatly influenced through contact with Indo-Aryan languages like Bengali. Santali is a relatively obscure language in India, and I’ve never heard it spoken. But 8 million speakers is a lot! To put into perspective - there are more people in India who speak a language related to Vietnamese than there are Danish speakers in the entire world.
Meitei/ꯃꯩꯇꯩꯂꯣꯟ (~3 million) - Meitei, also known as Manipuri, is the official language of Manipur state in Northeast India, right on the border with Myanmar. It belongs to the Tibeto-Burman branch of the Sino-Tibetan language family, which, as the names imply, means that it’s related to Tibetan, Burman, and more distantly to Chinese. I have never heard it spoken. Interestingly, the famous Indian playback singer Lata Mangeshkar apparently recorded two Meitei songs in 1999 for a Meitei-language film called Meichak. One of these songs is Pamuba Nungshiba. The Meitei language has its own script, derived from Brahmi and attested as early as the 6th century AD. It may also be written using the Bengali script.
Bodo/बरʼ (~1.4 million) - Another Tibeto-Burman language from Northeast India, especially in parts of Assam. Unlike Meitei, Bodo seems to be primarily written in Devanagari script.
And finally:
Sanskrit/संस्कृतम् - Basically, Sanskrit is to India what Latin is to Europe. However, while today Latin is mostly dead even as a liturgical language, Sanskrit prayers are recited every day by Hindus in their homes or in temples. While nobody really speaks Sanskrit colloquially, it lives on in the lexicon of essentially every Indian language. As one of the oldest attested Indo European languages, the grammar, vocabulary, and phonology of Sanskrit can be directly compared to Greek and Latin.9
Well, there you have it! - My attempt to succinctly describe the 22 languages scheduled by the Constitution of India. I’ve always been fascinated by India’s linguistic diversity and the way that Indians use their languages as vehicles for artistic and religious expression. One could spend many lifetimes learning about the cultural heritage that is collectively held in the Indian languages.
I thought it would be fitting to end with a clip of the closing song of B.R. Chopra’s Mahabharat television series (1988-1990). The episodes always ended with the word “Mahabharata” typed out in the script of 11 Indian languages. Play the clip below, and see if you can identify each language from its script!10
Some would instead say that Hindi is a Sanskritized version of Urdu. Ultimately, this comes down to a question of which language came “first,” i.e. did “Hindi” exist before the Islamic invasions, or was it “Urdu” that was first developed in the Indo-Islamic courts…at some point, this just becomes an argument over semantics.
I have to look into it more, but my guess is that Indian Muslims are likely to be able to read the Devanagari script used for Hindi, while Indian Hindus are unlikely to be able to read the Urdu script.
Interestingly, a lot of popular Punjabi music has been (and continues to be) produced in the diaspora itself. The mainstreaming of Punjabi diasporic culture, and its adoption by the larger Indian diaspora, is something that could be discussed in detail in a future post.
The songs in this movie were composed by none other than A.R. Rahman, while he was still up-and-coming in the 90s.
These subgroups also contain smaller languages which are not scheduled by the Indian government
However, the -u endings are commonly ommitted in spoken Kannada, which means that the two languages can actually sound quite similar when spoken.
These were actually one state (Andhra Pradesh) until 2014.
You may notice that this largely overlaps with the British colonial period of Indian history. The way that the Bengali intelligentsia interacted with and responded to colonialism was a big part of the Bengali Renaissance as well as the Indian independence movement. It’s also worth noting that the capital of British India was Calcutta up until 1911, when it was transferred to Delhi.
Answer: Punjabi (Gurmukhi), Bengali, Odia, Assamese, Telugu, Malayalam, Tamil, Kannada, Urdu, Gujarati, and Hindi