Native to Indian subcontinent
Native speakers
Extinct; used as a literary and liturgical language only 
Brāhmī and derived scripts and Latin alphabet
Language codes
ISO 639-1 pi
ISO 639-2 pli
ISO 639-3 pli
Glottolog pali1273[1]
Plate 10 from C. Faulmann: (1880). The upper half shows a text in Sanskrit (praise of Vishnu) written in Devanagari while the lower half shows a text in Pali from a Buddhist ceremonial scripture called "Kammuwa" from Burma probably in old Mon script ( of that book).

Pali (also i) is a Middle Indo-Aryan language that is in the Prakrit language group and was indigenous to the Indian subcontinent. It is a dead language that is widely studied because it is the language of many of the earliest extant Buddhist scriptures as collected in the i Canon, or Tipitaka, and it is the liturgical language of Theravada Buddhism.


  • Origin and development 1
    • Etymology 1.1
    • Classification 1.2
    • Early history 1.3
      • Pāli and Paiśācī 1.3.1
      • Theravada Buddhism 1.3.2
      • Early western views 1.3.3
      • Modern scholarship 1.3.4
    • Pali today 1.4
  • Lexicon 2
  • Emic views of Pali 3
  • Phonology 4
    • Vowels 4.1
    • Consonants 4.2
  • Morphology 5
    • Nominal inflection 5.1
      • a-stems 5.1.1
      • ā-stems 5.1.2
      • i-stems and u-stems 5.1.3
  • Linguistic analysis of a Pali text 6
  • Ardha-Magadhi 7
  • Sanskrit 8
    • Vowels and diphthongs 8.1
    • Consonants 8.2
      • Sound changes 8.2.1
      • Assimilations 8.2.2
        • General rules
        • Total assimilation
          • Progressive assimilations
          • Regressive assimilations
        • Partial and mutual assimilation
      • Epenthesis 8.2.3
      • Other changes 8.2.4
    • Exceptions 8.3
  • Writing 9
    • Alphabet with diacritics 9.1
    • Transliteration on computers 9.2
    • Text in ASCII 9.3
  • See also 10
  • References 11
  • Further reading 12
  • External links 13

Origin and development


The word Pali is used as a name for the language of the Theravada canon. According to the Pali Text Society's Dictionary, the word seems to have its origins in commentarial traditions, wherein the Pāli (in the sense of the line of original text quoted) was distinguished from the commentary or vernacular translation that followed it in the manuscript. As such, the name of the language has caused some debate among scholars of all ages; the spelling of the name also varies, being found with both long "ā" [ɑː] and short "a" [a], and also with either a retroflex [ɭ] or non-retroflex [l] "l" sound. Both the long ā and retroflex ḷ are seen in the ISO 15919/ALA-LC rendering, Pāḷi; however, to this day there is no single, standard spelling of the term, and all four possible spellings can be found in textbooks. R. C. Childers translates the word as "series" and states that the language "bears the epithet in consequence of the perfection of its grammatical structure".[2]

In the 19th century, the British Orientalist Robert Cæsar Childers argued that the true or geographical name of the Pāli language was Magadhi, and that because pāli means "line, row, series", the early Buddhists extended the meaning of the term to mean "a series of books", so Palibhasa means "language of the texts".[3] However, modern scholarship has regarded Pali as a mix of several prakrit languages from around the 3rd century BCE, combined together and partially Sanskritized.[4] The closest artifacts to Pali that have been found in India (Southern neighbour of Nepal) are Edicts of Ashoka found at Gujarat, in the west of India, leading some scholars to associate Pali with this region of western India.[5]


There is persistent confusion as to the relation of Pāḷi to the vernacular spoken in the ancient kingdom of Magadha, which was located around modern-day Bihār.

Pāli, as a Middle Indo-Aryan language, is different from Sanskrit not only with regard to the time of its origin but also as to its dialectal base since a number of its morphological and lexical features betray the fact that it is not a direct continuation of Ṛgvedic Vedic Sanskrit; rather, it descends from a dialect or number of dialects that were, despite many similarities, different from Ṛgvedic.[6]

However, this view is not shared by all scholars. Some, like A.C. Woolner, believe that Pali is derived from Vedic Sanskrit, but not necessarily from Classical Sanskrit.[7]

Early history

Pāli and Paiśācī

Paiśācī is a largely unattested literary language of classical India that is mentioned in Prakrit and Sanskrit grammars of antiquity. It is found grouped with the Prakrit languages, with which it shares some linguistic similarities, but was not considered a spoken language by the early grammarians because it was understood to have been purely a literary language.[8]

The etymology of the name suggests that it was spoken by Piśācas, i.e., ghouls. In works of Sanskrit poetics such as Daṇḍin's Kavyadarsha, it is also known by the name of Bhūtabhāṣa, an epithet which can be interpreted either as a 'dead language' (i.e., with no surviving speakers), or as 'a language spoken by the dead' (i.e., ghouls/ghosts), the former interpretation being more realistic and the latter being more fanciful. Evidence which lends support to the former interpretation is that literature in Paiśācī is fragmentary and extremely rare but may once have been common. There is no known complete work in this language; however, several scholars specializing in Indology such as Sten Konow,[8] Felix Lacôte[8] and Alfred Master,[9] have argued that Paiśācī was the ancient name for Pāli.

Theravada Buddhism

Many Theravada sources refer to the Pāli language as "Magadhan" or the "language of Magadha". This identification first appears in the commentaries, and may have been an attempt by Buddhists to associate themselves more closely with the Mauryans. The Buddha taught in Magadha, but the four most important places in his life are all outside of it. It is likely that he taught in several closely related dialects of Middle Indo-Aryan, which had a high degree of mutual intelligibility. There is no attested dialect of Middle Indo-Aryan with all the features of Pāli. Pāli has some commonalities with both the Ashokan inscriptions at Girnar in the West of India, and at Hathigumpha, Bhubaneswar, Orissa in the East. Similarities to the Western inscription may be misleading, because the inscription suggests that the Ashokan scribe may not have translated the material he received from Magadha into the vernacular of the people there. Whatever the relationship of the Buddha's speech to Pāli, the Canon was eventually transcribed and preserved entirely in it, while the commentarial tradition that accompanied it (according to the information provided by Buddhaghosa) was translated into Sinhalese and preserved in local languages for several generations.

In Sri Lanka, Pāli is thought to have entered into a period of decline ending around the 4th or 5th century (as Sanskrit rose in prominence, and simultaneously, as Buddhism's adherents became a smaller portion of the subcontinent), but ultimately survived. The work of Buddhaghosa was largely responsible for its reemergence as an important scholarly language in Buddhist thought. The Visuddhimagga, and the other commentaries that Buddhaghosa compiled, codified and condensed the Sinhalese commentarial tradition that had been preserved and expanded in Sri Lanka since the 3rd century BCE.

Early western views

T. W. Rhys Davids in his book Buddhist India,[10] and Wilhelm Geiger in his book Pāli Literature and Language, suggested that Pali may have originated as a lingua franca or common language of culture among people who used differing dialects in North India, used at the time of the Buddha and employed by him. Another scholar states that at that time it was "a refined and elegant vernacular of all Aryan-speaking people".[11] Modern scholarship has not arrived at a consensus on the issue; there are a variety of conflicting theories with supporters and detractors.[12] After the death of the Buddha, Pali may have evolved among Buddhists out of the language of the Buddha as a new artificial language.[13] R. C. Childers, who held to the theory that Pāli was Old Magadhi, wrote: "Had Gautama never preached, it is unlikely that Magadhese would have been distinguished from the many other vernaculars of Hindustan, except perhaps by an inherent grace and strength which make it a sort of Tuscan among the Prakrits."[14]

According to K. R. Norman, it is likely that the viharas in North India had separate collections of material, preserved in the local dialect. In the early period it is likely that no degree of translation was necessary in communicating this material to other areas. Around the time of Ashoka there had been more linguistic divergence, and an attempt was made to assemble all the material. It is possible that a language quite close to the Pāli of the canon emerged as a result of this process as a compromise of the various dialects in which the earliest material had been preserved, and this language functioned as a lingua franca among Eastern Buddhists in India from then on. Following this period, the language underwent a small degree of Sanskritisation (i.e., MIA bamhana > brahmana, tta > tva in some cases).[15]

Modern scholarship

Bhikkhu Bodhi, summarizing the current state of scholarship, states that the language is "closely related to the language (or, more likely, the various regional dialects) that the Buddha himself spoke". He goes on to write:

Scholars regard this language as a hybrid showing features of several Prakrit dialects used around the third century BCE, subjected to a partial process of Sanskritization. While the language is not identical to what Buddha himself would have spoken, it belongs to the same broad language family as those he might have used and originates from the same conceptual matrix. This language thus reflects the thought-world that the Buddha inherited from the wider Indian culture into which he was born, so that its words capture the subtle nuances of that thought-world.

According to A. K. Warder, the Pāli language is a Prakrit language used in a region of western India.[16] Warder associates Pāli with the Indian realm (janapada) of Avanti, where the Sthavira sect was centered.[16] Following the initial split in the Buddhist community, the Sthavira branch of Buddhism became influential in western and southern India, while the Mahāsāṃghika branch became influential in central and eastern India.[5] Akira Hirakawa and Paul Groner also associate Pāli with west India and the Sthavira sect, citing inscriptions at Girnar in Gujarat, India, which are linguistically closest to the Pāli language.[5]

Pali today

Today Pāli is studied mainly to gain access to Buddhist scriptures, and is frequently chanted in a ritual context. The secular literature of Pāli historical chronicles, medical texts, and inscriptions is also of great historical importance. The great centers of Pāli learning remain in the Theravada nations of Southeast Asia: Burma, Sri Lanka, Thailand, Laos, and Cambodia. Since the 19th century, various societies for the revival of Pali studies in India have promoted awareness of the language and its literature, perhaps most notably the Maha Bodhi Society founded by Anagarika Dhammapala.

In Europe, the Pali Text Society has been a major force in promoting the study of Pāli by Western scholars since its founding in 1881. Based in the United Kingdom, the society publishes romanized Pāli editions, along with many English translations of these sources. In 1869, the first Pali Dictionary was published using the research of Robert Caesar Childers, one of the founding members of the Pali Text Society. It was the first Pāli translated text in English and was published in 1872. Childers's dictionary later received the Volney Prize in 1876.

The Pali Text Society was founded in part to compensate for the very low level of funds allocated to Indology in late 19th-century England and the rest of the UK; incongruously, the citizens of the UK were not nearly so robust in Sanskrit and Prakrit language studies as Germany, Russia, and even Denmark. Even without the inspiration of colonial holdings such as the former British occupation of Sri Lanka and Burma, institutions such as the Danish Royal Library have built up major collections of Pāli manuscripts, and major traditions of Pāli studies.


Virtually every word in Pāḷi has cognates in the other Prakritic Middle Indo-Aryan languages, e.g., the Jain Prakrits. The relationship to earlier Sanskrit (e.g., Vedic language) is less direct and more complicated. Historically, influence between Pali and Sanskrit has been felt in both directions. The Pali language's resemblance to Sanskrit is often exaggerated by comparing it to later Sanskrit compositions – which were written centuries after Sanskrit ceased to be a living language, and are influenced by developments in Middle Indic, including the direct borrowing of a portion of the Middle Indic lexicon; whereas, a good deal of later Pali technical terminology has been borrowed from the vocabulary of equivalent disciplines in Sanskrit, either directly or with certain phonological adaptations.

Post-canonical Pali also possesses a few loan-words from local languages where Pali was used (e.g. Sri Lankans adding Sinhalese words to Pali). These usages differentiate the Pali found in the Suttapiṭaka from later compositions such as the Pali commentaries on the canon and folklore (e.g., the stories of the Jātaka commentaries), and comparative study (and dating) of texts on the basis of such loan-words is now a specialized field unto itself.

Pali was not exclusively used to convey the teachings of the Buddha, as can be deduced from the existence of a number of secular texts, such as books of medical science/instruction, in Pali. However, scholarly interest in the language has been focused upon religious and philosophical literature, because of the unique window it opens on one phase in the development of Buddhism.

Emic views of Pali

Although Sanskrit was said in the Brahmanical tradition to be the unchanging language spoken by the gods, in which each word had an inherent significance, this view of language was not shared in the early Buddhist tradition, in which words were only conventional and mutable signs.[17] This view of language naturally extended to Pali, and may have contributed to its usage (as an approximation or standardization of local Middle Indic dialects) in place of Sanskrit. However, by the time of the compilation of the Pali commentaries (4th or 5th century), Pali was regarded as the natural language, the root language of all beings.[18]

Comparable to Ancient Egyptian, Latin or Hebrew in the mystic traditions of the West, Pali recitations were often thought to have a supernatural power (which could be attributed to their meaning, the character of the reciter, or the qualities of the language itself), and in the early strata of Buddhist literature we can already see Pali dhāraṇīs used as charms, as, for example, against the bite of snakes. Many people in Theravada cultures still believe that taking a vow in Pali has a special significance, and, as one example of the supernatural power assigned to chanting in the language, the recitation of the vows of Aṅgulimāla are believed to alleviate the pain of childbirth in Sri Lanka. In Thailand, the chanting of a portion of the Abhidhammapiṭaka is believed to be beneficial to the recently departed, and this ceremony routinely occupies as much as seven working days. Interestingly, there is nothing in the latter text that relates to this subject, and the origins of the custom are unclear.



Height Backness
Front Central Back
High i [i]

ī [iː]

u [u]

ū [uː]

Mid e [e], [eː] a [ɐ] o [o], [oː]
Low ā [aː]

Long and short vowels are only contrastive in open syllables; in closed syllables, all vowels are always short. Short and long e and o are in complementary distribution: the short variants occur only in closed syllables, the long variants occur only in open syllables. Short and long e and o are therefore not distinct phonemes.

A sound called anusvāra (Skt.; Pali: nigghahita), represented by the letter (ISO 15919) or (ALA-LC) in romanization, and by a raised dot in most traditional alphabets, originally marked the fact that the preceding vowel was nasalized. That is, aṁ, iṁ and uṁ represented [ã], [ĩ] and [ũ]. In many traditional pronunciations, however, the anusvāra is pronounced more strongly, like the velar nasal [ŋ], so that these sounds are pronounced instead [ãŋ], [ĩŋ] and [ũŋ]. However pronounced, never follows a long vowel; ā, ī and ū are converted to the corresponding short vowels when is added to a stem ending in a long vowel, e.g. kathā + ṁ becomes kathaṁ, not *kathāṁ, devī + ṁ becomes deviṁ, not *devīṁ.


The table below lists the consonants of Pali. In bold is the transliteration of the letter in traditional romanization, and in square brackets its pronunciation transcribed in the IPA.

Labial Dental Alveolar Retro-
Palatal Velar Glottal
bilabial labiodental
Stop Nasal m [m] n [n̪] [ɳ] ñ [ɲ] ( [ŋ])
voiceless unaspirated p [p] t [t̪] [ʈ] c [tʃ] k [k]
aspirated ph [pʰ] th [t̪ʰ] ṭh [ʈʰ] ch[tʃʰ] kh [kʰ]
voiced unaspirated b [b] d [d̪] [ɖ] j [dʒ] g [ɡ]
aspirated bh [bʱ] dh [d̪ʱ] ḍh [ɖʱ] jh [dʒʱ] gh [ɡʱ]
Fricative s [s] h [h]
Approximant central v [ʋ] r [ɻ] y [j]
lateral l [l] ( [ɭ])
lateral aspirated (ḷh [ɭʱ])

Of the sounds listed above only the three consonants in parentheses, , , and ḷh, are not distinct phonemes in Pali: only occurs before velar stops, while and ḷh are allophones of single and ḍh occurring between vowels.


Pali is a highly inflected language, in which almost every word contains, besides the root conveying the basic meaning, one or more affixes (usually suffixes) which modify the meaning in some way. Nouns are inflected for gender, number, and case; verbal inflections convey information about person, number, tense and mood.

Nominal inflection

Pali nouns inflect for three grammatical genders (masculine, feminine, and neuter) and two numbers (singular and plural). The nouns also, in principle, display eight cases: nominative or paccatta case, vocative, accusative or upayoga case, instrumental or karaṇa case, dative or sampadāna case, ablative, genitive or sāmin case, and locative or bhumma case; however, in many instances, two or more of these cases are identical in form; this is especially true of the genitive and dative cases.


a-stems, whose uninflected stem ends in short a (/ə/), are either masculine or neuter. The masculine and neuter forms differ only in the nominative, vocative, and accusative cases.

Masculine (loka- "world") Neuter (yāna- "carriage")
Singular Plural Singular Plural
Nominative loko lokā yānaṁ yānāni
Vocative loka
Accusative lokaṁ loke
Instrumental lokena lokehi yānena yānehi
Ablative lokā (lokamhā, lokasmā; lokato) yānā (yānamhā, yānasmā; yānato)
Dative lokassa (lokāya) lokānaṁ yānassa (yānāya) yānānaṁ
Genitive lokassa yānassa
Locative loke (lokasmiṁ) lokesu yāne (yānasmiṁ) yānesu


Nouns ending in ā (/aː/) are almost always feminine.

Feminine (kathā- "story")
Singular Plural
Nominative kathā kathāyo
Vocative kathe
Accusative kathaṁ
Instrumental kathāya kathāhi
Dative kathānaṁ
Locative kathāya, kathāyaṁ kathāsu

i-stems and u-stems

i-stems and u-stems are either masculine or neuter. The masculine and neuter forms differ only in the nominative and accusative cases. The vocative has the same form as the nominative.

Masculine (isi- "seer") Neuter (akkhi- "fire")
Singular Plural Singular Plural
Nominative isi isayo, isī akkhi, akkhiṁ akkhī, akkhīni
Accusative isiṁ
Instrumental isinā isihi, isīhi akkhinā akkhihi, akkhīhi
Ablative isinā, isito akkhinā, akkhito
Dative isino isinaṁ, isīnaṁ akkhino akkhinaṁ, akkhīnaṁ
Genitive isissa, isino akkhissa, akkhino
Locative isismiṁ isisu, isīsu akkhismiṁ akkhisu, akkhīsu
Masculine (bhikkhu- "monk") Neuter (cakkhu- "eye")
Singular Plural Singular Plural
Nominative bhikkhu bhikkhavo, bhikkhū cakkhu, cakkhuṁ cakkhūni
Accusative bhikkhuṁ
Instrumental bhikkhunā bhikkhūhi cakkhunā cakkhūhi
Dative bhikkhuno bhikkhūnaṁ cakkhuno cakkhūnaṁ
Genitive bhikkhussa, bhikkhuno bhikkhūnaṁ, bhikkhunnaṁ cakkhussa, cakkhuno cakkhūnaṁ, cakkhunnaṁ
Locative bhikkhusmiṁ bhikkhūsu cakkhusmiṁ cakkhūsu

Linguistic analysis of a Pali text

From the opening of the Dhammapada:

Manopubbaṅgamā dhammā, manoseṭṭhā manomayā;
Manasā ce paduṭṭhena, bhāsati vā karoti vā,
Tato nam dukkhaṁ anveti, cakkaṁ'va vahato padaṁ.

Element for element gloss:

Mano-pubbaṅ-gam=ā dhamm=ā, mano-seṭṭh=ā mano-may=ā;,
Manas=ā ce paduṭṭh=ena, bhāsa=ti vā karo=ti vā, if either or,
Ta=to naṁ dukkhaṁ anv-e=ti, cakkaṁ 'va vahat=o pad=aṁ.
That=from him suffering, wheel as carrying(beast)

The three compounds in the first line literally mean:

manopubbaṅgama "whose precursor is mind", "having mind as a fore-goer or leader"
manoseṭṭha "whose foremost member is mind", "having mind as chief"
manomaya "consisting of mind" or "made by mind"

The literal meaning is therefore: "The dharmas have mind as their leader, mind as their chief, are made of/by mind. If [someone] either speaks or acts with a corrupted mind, from that [cause] suffering goes after him, as the wheel [of a cart follows] the foot of a draught animal."

A slightly freer translation by Acharya Buddharakkhita

Mind precedes all mental states. Mind is their chief; they are all mind-wrought.
If with an impure mind a person speaks or acts suffering follows him
like the wheel that follows the foot of the ox.


The most archaic of the Middle Indo-Aryan languages are the inscriptional Aśokan Prakrit on the one hand and Pāli and Ardhamāgadhī ("Half Magadhi") on the other, both literary languages.

The Indo-Aryan languages are commonly assigned to three major groups – Old, Middle and New Indo-Aryan. The classification reflects consecutive stages in a common linguistic development, but is not merely a matter of chronology: Classical Sanskrit, as a codified derivate of Vedic Sanskrit, remains mostly representative of the Old Indo-Aryan stage, even though it continued to flourish at the same time as the Middle Indo-Aryan languages. Conversely, a number of the morphophonological and lexical features of the Middle Indo-Aryan languages betray the fact that they may not be direct continuations of Ṛgvedic Sanskrit, the main base of Classical Sanskrit; rather they descend from other very similar Old-Indo-Aryan dialects which some regard as probably even more archaic than Rigvedic.

MIA languages, though individually distinct, share features of phonology and morphology which characterize them as parallel descendants of Old Indo-Aryan. Various sound changes are typical of the MIA phonology:

(1) The vocalic liquids 'ṛ' and 'ḷ' are replaced by 'a', 'i' or 'u'; (2) the diphthongs 'ai' and 'au' are monophthongized to 'e' and 'o'; (3) long vowels before two or more consonants are shortened; (4) the three sibilants of OIA are reduced to one, either 'ś' or 's'; (5) the often complex consonant clusters of OIA are reduced to more readily pronounceable forms, either by assimilation or by splitting; (6) single intervocalic stops are progressively weakened; (7) dentals are palatalized by a following '-y-'; (8) all final consonants except '-ṃ' are dropped unless they are retained in 'sandhi' junctions.

The most conspicuous features of the morphological system of these languages are: loss of the dual; thematicization of consonantal stems; merger of the f. 'i-/u-' and 'ī-/ū-' in one 'ī-/ū-' inflexion, elimination of the dative, whose functions are taken over by the genitive, simultaneous use of different case-endings in one paradigm; employment of 'mahyaṃ' and 'tubhyaṃ' as genitives and 'me' and 'te' as instrumentals; gradual disappearance of the middle voice; coexistence of historical and new verbal forms based on the present stem; and use of active endings for the passive. In the vocabulary, the MIA languages are mostly dependent on Old Indo-Aryan, with addition of a few so-called 'deśī' words of (often) uncertain origin.


Pali and Sanskrit are very closely related and the common characteristics of Pali and Sanskrit were always easily recognized by those in India who were familiar with both. Indeed, a very large proportion of Pali and Sanskrit word-stems are identical in form, differing only in details of inflection.

Technical terms from Sanskrit were converted into Pali by a set of conventional phonological transformations. These transformations mimicked a subset of the phonological developments that had occurred in Proto-Pali. Because of the prevalence of these transformations, it is not always possible to tell whether a given Pali word is a part of the old Prakrit lexicon, or a transformed borrowing from Sanskrit. The existence of a Sanskrit word regularly corresponding to a Pali word is not always secure evidence of the Pali etymology, since, in some cases, artificial Sanskrit words were created by back-formation from Prakrit words.

The following phonological processes are not intended as an exhaustive description of the historical changes which produced Pali from its Old Indic ancestor, but rather are a summary of the most common phonological equations between Sanskrit and Pali, with no claim to completeness.

Vowels and diphthongs

  • Sanskrit ai and au always monophthongize to Pali e and o, respectively
Examples: maitrīmettā, auṣadhaosadha
  • Sanskrit aya and ava likewise often reduce to Pali e and o
Examples: dhārayatidhāreti, avatāraotāra, bhavatihoti
  • Sanskrit avi becomes Pali e (i.e. aviaie)
Example: sthavirathera
  • Sanskrit appears in Pali as a, i or u, often agreeing with the vowel in the following syllable. also sometimes becomes u after labial consonants.
Examples: kṛtakata, tṛṣṇataṇha, smṛtisati, ṛṣiisi, dṛṣṭidiṭṭhi, ṛddhiiddhi, ṛjuuju, spṛṣṭaphuṭṭha, vṛddhavuddha
  • Sanskrit long vowels are shortened before a sequence of two following consonants.
Examples: kṣāntikhanti, rājyarajja, īśvaraissara, tīrṇatiṇṇa, pūrvapubba


Sound changes

  • The Sanskrit sibilants ś, , and s merge as Pali s
Examples: śaraṇasaraṇa, doṣadosa
  • The Sanskrit stops and ḍh become and ḷh between vowels (as in Vedic)
Example: cakravāḍacakkavāḷa, virūḍhavirūḷha


General rules
  • Many assimilations of one consonant to a neighboring consonant occurred in the development of Pali, producing a large number of geminate (double) consonants. Since aspiration of a geminate consonant is only phonetically detectable on the last consonant of a cluster, geminate kh, gh, ch, jh, ṭh, ḍh, th, dh, ph and bh appear as kkh, ggh, cch, jjh, ṭṭh, ḍḍh, tth, ddh, pph and bbh, not as khkh, ghgh etc.
  • When assimilation would produce a geminate consonant (or a sequence of unaspirated stop+aspirated stop) at the beginning of a word, the initial geminate is simplified to a single consonant.
Examples: prāṇapāṇa (not ppāṇa), sthavirathera (not tthera), dhyānajhāna (not jjhāna), jñātiñāti (not ññāti)
  • When assimilation would produce a sequence of three consonants in the middle of a word, geminates are simplified until there are only two consonants in sequence.
Examples: uttrāsauttāsa (not utttāsa), mantramanta (not mantta), indrainda (not indda), vandhyavañjha (not vañjjha)
  • The sequence vv resulting from assimilation changes to bb
Example: sarva → savva → sabba, pravrajati → pavvajati → pabbajati, divya → divva → dibba, nirvāṇa → nivvāṇa → nibbāna
Total assimilation

Total assimilation, where one sound becomes identical to a neighboring sound, is of two types: progressive, where the assimilated sound becomes identical to the following sound; and regressive, where it becomes identical to the preceding sound.

Progressive assimilations
  • Internal visarga assimilates to a following voiceless stop or sibilant
Examples: duḥkṛtadukkata, duḥkhadukkha, duḥprajñaduppañña, niḥkrodha (=niṣkrodha) → nikkodha, niḥpakva (=niṣpakva) → nippakka, niḥśokanissoka, niḥsattvanissatta
  • In a sequence of two dissimilar Sanskrit stops, the first stop assimilates to the second stop
Examples: vimuktivimutti, dugdhaduddha, utpādauppāda, pudgalapuggala, udghoṣaugghosa, adbhutaabbhuta, śabdasadda
  • In a sequence of two dissimilar nasals, the first nasal assimilates to the second nasal
Example: unmattaummatta, pradyumnapajjunna
  • j assimilates to a following ñ (i.e., becomes ññ)
Examples: prajñāpaññā, jñātiñāti
  • The Sanskrit liquid consonants r and l assimilate to a following stop, nasal, sibilant, or v
Examples: mārgamagga, karmakamma, varṣavassa, kalpakappa, sarva → savva → sabba
  • r assimilates to a following l
Examples: durlabhadullabha, nirlopanillopa
  • d sometimes assimilates to a following v, producing vv → bb
Examples: udvigna → uvvigga → ubbigga, dvādaśabārasa (beside dvādasa)
  • t and d may assimilate to a following s or y when a morpheme boundary intervenes
Examples: ut+savaussava, ud+yānauyyāna
Regressive assimilations
  • Nasals sometimes assimilate to a preceding stop (in other cases epenthesis occurs)
Examples: agniaggi, ātmanatta, prāpnotipappoti, śaknotisakkoti
  • m assimilates to an initial sibilant
Examples: smaratisarati, smṛtisati
  • Nasals assimilate to a preceding stop+sibilant cluster, which then develops in the same way as such clusters without following nasals
Examples: tīkṣṇa → tikṣa → tikkha, lakṣmī → lakṣī →lakkhī
  • The Sanskrit liquid consonants r and l assimilate to a preceding stop, nasal, sibilant, or v
Examples: prāṇapāṇa, grāmagāma, śrāvakasāvaka, agraagga, indrainda, pravrajati → pavvajati → pabbajati, aśruassu
  • y assimilates to preceding non-dental/retroflex stops or nasals
Examples: cyavaticavati, jyotiṣjoti, rājyarajja, matsya → macchya → maccha, lapsyate → lacchyate → lacchati, abhyāgataabbhāgata, ākhyātiakkhāti, saṁkhyāsaṅkhā (but also saṅkhyā), ramyaramma
  • y assimilates to preceding non-initial v, producing vv → bb
Example: divya → divva → dibba, veditavya → veditavva → veditabba, bhāvya → bhavva → bhabba
  • y and v assimilate to any preceding sibilant, producing ss
Examples: paśyatipassati, śyenasena, aśvaassa, īśvaraissara, kariṣyatikarissati, tasyatassa, svāminsāmī
  • v sometimes assimilates to a preceding stop
Examples: pakvapakka, catvāricattāri, sattvasatta, dhvajadhaja
Partial and mutual assimilation
  • Sanskrit sibilants before a stop assimilate to that stop, and if that stop is not already aspirated, it becomes aspirated; e.g. śc, st, ṣṭ and sp become cch, tth, ṭṭh and pph
Examples: paścātpacchā, astiatthi, stavathava, śreṣṭhaseṭṭha, aṣṭaaṭṭha, sparśaphassa
  • In sibilant-stop-liquid sequences, the liquid is assimilated to the preceding consonant, and the cluster behaves like sibilant-stop sequences; e.g. str and ṣṭr become tth and ṭṭh
Examples: śāstra → śasta → sattha, rāṣṭra → raṣṭa → raṭṭha
  • t and p become c before s, and the sibilant assimilates to the preceding sound as an aspirate (i.e., the sequences ts and ps become cch)
Examples: vatsavaccha, apsarasaccharā
  • A sibilant assimilates to a preceding k as an aspirate (i.e., the sequence kṣ becomes kkh)
Examples: bhikṣubhikkhu, kṣāntikhanti
  • Any dental or retroflex stop or nasal followed by y converts to the corresponding palatal sound, and the y assimilates to this new consonant, i.e. ty, thy, dy, dhy, ny become cc, cch, jj, jjh, ññ; likewise ṇy becomes ññ. Nasals preceding a stop that becomes palatal share this change.
Examples: tyajati → cyajati → cajati, satya → sacya → sacca, mithyā → michyā → micchā, vidyā → vijyā → vijjā, madhya → majhya → majjha, anya → añya → añña, puṇya → puñya → puñña, vandhya → vañjhya → vañjjha → vañjha
  • The sequence mr becomes mb, via the epenthesis of a stop between the nasal and liquid, followed by assimilation of the liquid to the stop and subsequent simplification of the resulting geminate.
Examples: āmra → ambra → amba, tāmratamba


An epenthetic vowel is sometimes inserted between certain consonant-sequences. As with , the vowel may be a, i, or u, depending on the influence of a neighboring consonant or of the vowel in the following syllable. i is often found near i, y, or palatal consonants; u is found near u, v, or labial consonants.

  • Sequences of stop + nasal are sometimes separated by a or u
Example: ratnaratana, padmapaduma (u influenced by labial m)
  • The sequence sn may become sin initially
Examples: snānasināna, snehasineha
  • i may be inserted between a consonant and l
Examples: kleśakilesa, glānagilāna, mlāyatimilāyati, ślāghatisilāghati
  • An epenthetic vowel may be inserted between an initial sibilant and r
Example: śrīsirī
  • The sequence ry generally becomes riy (i influenced by following y), but is still treated as a two-consonant sequence for the purposes of vowel-shortening
Example: ārya → arya → ariya, sūrya → surya → suriya, vīrya → virya → viriya
  • a or i is inserted between r and h
Example: arhatiarahati, garhāgarahā, barhiṣbarihisa
  • There is sporadic epenthesis between other consonant sequences
Examples: caityacetiya (not cecca), vajravajira (not vajja)

Other changes

  • Any Sanskrit sibilant before a nasal becomes a sequence of nasal followed by h, i.e. ṣṇ, sn and sm become ṇh, nh, and mh
Examples: tṛṣṇataṇha, uṣṇīṣauṇhīsa, asmiamhi
  • The sequence śn becomes ñh, due to assimilation of the n to the preceding palatal sibilant
Example: praśna → praśña → pañha
Examples: jihvājivhā, gṛhyagayha, guhyaguyha
  • h undergoes metathesis with a following nasal
Example: gṛhṇātigaṇhāti
  • y is geminated between e and a vowel
Examples: śreyasseyya, MaitreyaMetteyya
  • Voiced aspirates such as bh and gh on rare occasions become h
Examples: bhavatihoti, -ebhiṣ-ehi, laghulahu
  • Dental and retroflex sounds sporadically change into one another
Examples: jñānañāṇa (not ñāna), dahatiḍahati (beside Pali dahati) nīḍanīla (not nīḷa), sthānaṭhāna (not thāna), duḥkṛtadukkaṭa (beside Pali dukkata)


There are several notable exceptions to the rules above; many of them are common Prakrit words rather than borrowings from Sanskrit.

  • āryaayya (beside ariya)
  • gurugaru (adj.) (beside guru (n.))
  • puruṣapurisa (not purusa)
  • vṛkṣa → rukṣa → rukkha (not vakkha)


Alphabet with diacritics

King Ashoka erected a number of pillars with his edicts in at least three regional Prakrit languages in Brahmi script,[19] all of which are quite similar to Pali. Historically, the first written record of the Pali canon is believed to have been composed in Sri Lanka, based on a prior oral tradition. As per the Mahavamsa (the chronicle of Sri Lanka), due to a major famine in the country Buddhist monks wrote down the Pali canon during the time of King Vattagamini in 100 BC. The transmission of written Pali has retained a universal system of alphabetic values, but has expressed those values in a stunning variety of actual scripts.

In Sri Lanka, Pali texts were recorded in Sinhala script. Other local scripts, most prominently Khmer, Burmese, and in modern times Thai (since 1893), Devanāgarī and Mon script (Mon State, Burma) have been used to record Pali.

Since the 19th century, Pali has also been written in the Roman script. An alternate scheme devised by Frans Velthuis allows for typing without diacritics using plain ASCII methods, but is arguably less readable than the standard Rhys Davids system, which uses diacritical marks.

The Pali alphabetical order is as follows:

  • a ā i ī u ū e o ṃ k kh g gh ṅ c ch j jh ñ ṭ ṭh ḍ ḍh ṇ t th d dh n p ph b bh m y r l ḷ v s h

ḷh, although a single sound, is written with ligature of and h.

Transliteration on computers

There are several fonts to use for Pali transliteration. However, older ASCII fonts such as Leedsbit PaliTranslit, Times_Norman, Times_CSX+, Skt Times, Vri RomanPali CN/CB etc., are not recommendable since they are not compatible with one another and technically out of date. On the contrary, fonts based on the Unicode standard are recommended because Unicode seems to be the future for all fonts and also because they are easily portable to one another.

However, not all Unicode fonts contain the necessary characters. To properly display all the diacritic marks used for romanized Pali (or for that matter, Sanskrit), a Unicode font must contain the following character ranges:

  • Basic Latin: U+0000 – U+007F
  • Latin-1 Supplement: U+0080 – U+00FF
  • Latin Extended-A: U+0100 – U+017F
  • Latin Extended-B: U+0180 – U+024F
  • Latin Extended Additional: U+1E00 – U+1EFF

Some Unicode fonts freely available for typesetting Romanized Pali are as follows:

  • The Pali Text Society recommends VU-Times and Gandhari Unicode for Windows and Linux Computers.
  • The Tibetan & Himalayan Digital Library recommends Times Ext Roman, and provides links to several Unicode diacritic Windows and Mac fonts usable for typing Pali together with ratings and installation instructions. It also provides macros for typing diacritics in OpenOffice and MS Office.
  • SIL: International provides Charis SIL and Charis SIL Compact, Doulos SIL, Gentium, Gentium Basic, Gentium Book Basic fonts. Of them, Charis SIL, Gentium Basic and Gentium Book Basic have all 4 styles (regular, italic, bold, bold-italic); so can provide publication quality typesetting.
  • Libertine Openfont Project provides the Linux Libertine font (4 serif styles and many Opentype features) and Linux Biolinum (4 sans-serif styles) at the Sourceforge.
  • Junicode (short for Junius-Unicode) is a Unicode font for medievalists, but it provides all diacritics for typing Pali. It has 4 styles and some Opentype features such as Old Style for numerals.
  • Thryomanes includes all the Roman-alphabet characters available in Unicode along with a subset of the most commonly used Greek and Cyrillic characters, and is available in normal, italic, bold, and bold italic.
  • GUST (Polish TeX User Group) provides Latin Modern and TeX Gyre fonts. Each font has 4 styles, with the former finding most acceptance among the LaTeX users while the latter is a relatively new family. Of the latter, each typeface in the following families has nearly 1250 glyphs and is available in PostScript, TeX and OpenType formats.
    • The TeX Gyre Adventor family of sans serif fonts is based on the URW Gothic L family. The original font, ITC Avant Garde Gothic, was designed by Herb Lubalin and Tom Carnase in 1970.
    • The TeX Gyre Bonum family of serif fonts is based on the URW Bookman L family. The original font, Bookman or Bookman Old Style, was designed by Alexander Phemister in 1860.
    • The TeX Gyre Chorus is a font based on the URW Chancery L Medium Italic font. The original, ITC Zapf Chancery, was designed in 1979 by Hermann Zapf.
    • The TeX Gyre Cursor family of monospace serif fonts is based on the URW Nimbus Mono L family. The original font, Courier, was designed by Howard G. (Bud) Kettler in 1955.
    • The TeX Gyre Heros family of sans serif fonts is based on the URW Nimbus Sans L family. The original font, Helvetica, was designed in 1957 by Max Miedinger.
    • The TeX Gyre Pagella family of serif fonts is based on the URW Palladio L family. The original font, Palatino, was designed by Hermann Zapf in the 1940s.
    • The TeX Gyre Schola family of serif fonts is based on the URW Century Schoolbook L family. The original font, Century Schoolbook, was designed by Morris Fuller Benton in 1919.
    • The TeX Gyre Termes family of serif fonts is based on the Nimbus Roman No9 L family. The original font, Times Roman, was designed by Stanley Morison together with Starling Burgess and Victor Lardent.
  • John Smith provides IndUni Opentype fonts, based upon URW++ fonts. Of them:
    • IndUni-C is Courier-lookalike;
    • IndUni-H is Helvetica-lookalike;
    • IndUni-N is New Century Schoolbook-lookalike;
    • IndUni-P is Palatino-lookalike;
    • IndUni-T is Times-lookalike;
    • IndUni-CMono is Courier-lookalike but monospaced;
  • An English Buddhist monk titled Bhikkhu Pesala provides some Pali OpenType fonts he has designed himself. Of them:
    • Akkhara has been discontinued.
    • Cankama is a Gothic, Black Letter script. Regular style only.
    • "Carita" is a Small Caps font with matching glyphs for Basic Greek. Regular and Bold styles.
    • Garava was designed for body text with a generous x-height and economical copyfit. It includes Petite Caps (as OpenType Features), and Heavy styles besides the usual four styles (regular, italic, bold, bold italic).
    • Guru is another font family for body text with OpenType features. Regular, italic, bold and bold italic styles.
    • Hattha is a hand-writing font. Regular, italic, and bold styles.
    • Kabala is a distinctive Sans Serif typeface designed for display text or headings. Regular, italic, bold and bold italic styles.
    • Lekhana is a Zapf Chancery clone, a flowing script that can be used for correspondence or body text. Regular, italic, bold and bold italic styles.
    • Mandala is designed for display text or headings. Regular, italic, bold and bold italic styles.
    • Pali is a clone of Hermann Zapf's Palatino. Regular, italic, bold and bold italic styles.
    • Odana is a calligraphic brush font suitable for headlines, titles, or short texts where a less formal appearance is wanted. Regular style only.
    • Talapanna and Talapatta are clones of Goudy Bertham, with decorative gothic capitals and extra ligatures in the Private Use Area. These two are different only in decorative gothic capitals in the Private Use Area. Regular and bold styles.
    • Veluvana is another brush calligraphic font but basic Greek glyphs are taken from Guru. Regular style only.
    • Verajja is derived from Bitstream Vera. Regular, italic, bold and bold italic styles.
    • VerajjaPDA is a cut-down version of Verajja without symbols. For use on PDA devices. Regular, italic, bold and bold italic styles.
    • He also provides some Pali keyboards for Windows XP.
  • The font section of Alanwood's Unicode Resources have links to several general purpose fonts that can be used for Pali typing if they cover the character ranges above.

Some of the latest fonts coming with Windows 7 can also be used to type transliterated Pali: Arial, Calibri, Cambria, Courier New, Microsoft Sans Serif, Segoe UI, Segoe UI Light, Segoe UI Semibold, Tahoma, and Times New Roman. And some of them have 4 styles each hence usable in professional typesetting: Arial, Calibri and Segoe UI are sans-serif fonts, Cambria and Times New Roman are serif fonts and Courier New is a monospace font.

Text in ASCII

The Velthuis scheme was originally developed in 1991 by Frans Velthuis for use with his "devnag" Devanāgarī font, designed for the TeX typesetting system. This system of representing Pali diacritical marks has been used in some websites and discussion lists. However, as the Web itself and email software slowly evolve towards the Unicode encoding standard, this system has become almost unnecessary and obsolete.

The following table compares various conventional renderings and shortcut key assignments:
character ASCII rendering character name Unicode number key combination HTML code
ā aa a macron U+0101 Alt+A ā
ī ii i macron U+012B Alt+I ī
ū uu u macron U+016B Alt+U ū
.m m dot-under U+1E43 Alt Gr+M
.n n dot-under U+1E47 Alt+N
ñ ~n n tilde U+00F1 Alt+Ctrl+N ñ
.t t dot-under U+1E6D Alt+T
.d d dot-under U+1E0D Alt+D
"n n dot-over U+1E45 Ctrl+N
.l l dot-under U+1E37 Alt+L

See also


  1. ^ a b Nordhoff, Sebastian; Hammarström, Harald; Forkel, Robert; Haspelmath, Martin, eds. (2013). "Pali". Glottolog 2.2. Leipzig: Max Planck Institute for Evolutionary Anthropology. 
  2. ^ Hazra, Kanai Lal. Pāli Language and Literature; a systematic survey and historical study. D.K. Printworld Lrd., New Delhi, 1994, page 19.
  3. ^ A Dictionary of the Pali Language By Robert Cæsar Childers
  4. ^ a b Bhikkhu Bodhi, In the Buddha's Words. Wisdom Publications, 2005, page 10.
  5. ^ a b c Hirakawa, Akira. Groner, Paul. A History of Indian Buddhism: From Śākyamuni to Early Mahāyāna. 2007. p. 119
  6. ^ Oberlies, Thomas Pāli: A Grammar of the Language of the Theravāda Tipiṭaka, Walter de Gruyter, 2001.
  7. ^ "If in "Sanskrit" we include the Vedic language and all dialects of the Old Indian period, then it is true to say that all the Prakrits are derived from Sanskrit. If on the other hand " Sanskrit " is used more strictly of the Panini-Patanjali language or "Classical Sanskrit," then it is untrue to say that any Prakrit is derived from Sanskrit, except that S'auraseni, the Midland Prakrit, is derived from the Old Indian dialect". Introduction to Prakrit, by Alfred C Woolner. Baptist Mission Press 1917
  8. ^ a b c
  9. ^
  10. ^ Buddhist India, ch. 9 Retrieved 14 June 2010.
  11. ^ Hazra, Kanai Lal. Pāli Language and Literature; a systematic survey and historical study. D.K. Printworld Lrd., New Delhi, 1994, page 11.
  12. ^ Hazra, Kanai Lal. Pāli Language and Literature; a systematic survey and historical study. D.K. Printworld Lrd., New Delhi, 1994, pages 1-44.
  13. ^ Hazra, Kanai Lal. Pāli Language and Literature; a systematic survey and historical study. D.K. Printworld Lrd., New Delhi, 1994, page 29.
  14. ^ Hazra, Kanai Lal. Pāli Language and Literature; a systematic survey and historical study. D.K. Printworld Lrd., New Delhi, 1994, page 20.
  15. ^ K. R. Norman, Pāli Literature. Otto Harrassowitz, 1983, pages 1-7.
  16. ^ a b Warder, A. K. Indian Buddhism. 2000. p. 284
  17. ^ David Kalupahana, Nagarjuna: The Philosophy of the Middle Way. SUNY Press, 1986, page 19. The author refers specifically to the thought of early Buddhism here.
  18. ^ Dispeller of Delusion, Pali Text Society, volume II, pages 127f
  19. ^ Inscriptions of Aśoka by Alexander Cunningham, Eugen Hultzsch. Calcutta: Office of the Superintendent of Government Printing. Calcutta: 1877
  • See entries for "Pali" (written by K. R. Norman of the Pali Text Society) and "India--Buddhism" in The Concise Encyclopedia of Language and Religion, (Sawyer ed.) ISBN 0-08-043167-4
  • de Silva, Lily (1994). Pali Primer (first edition ed.). Vipassana Research Institute Publications.  
  • Müller, Edward (1884,1995). Simplified Grammar of the Pali Language. Asian Educational Services.  

Further reading

  • Gupta, K. M. (2006). Linguistic approach to meaning in Pali. New Delhi: Sundeep Prakashan. ISBN 81-7574-170-8
  • Müller, E. (2003). The Pali language: a simplified grammar. Trubner's collection of simplified grammars. London: Trubner. ISBN 1-84453-001-9
  • Oberlies, T., & Pischel, R. (2001). Pāli: a grammar of the language of the Theravāda Tipiṭaka. Indian philology and South Asian studies, v. 3. Berlin: Walter de Gruyter. ISBN 3-11-016763-8
  • Hazra, K. L. (1994). Pāli language and literature: a systematic survey and historical study. Emerging perceptions in Buddhist studies, no. 4-5. New Delhi: D.K. Printworld. ISBN 81-246-0004-X
  • American National Standards Institute. (1979). American National Standard system for the romanization of Lao, Khmer, and Pali. New York: The Institute.
  • Russell Webb (ed.) An Analysis of the Pali Canon, Buddhist Publication Society, Kandy; 1975, 1991 (see
  • Soothill, W. E., & Hodous, L. (1937). A dictionary of Chinese Buddhist terms: with Sanskrit and English equivalents and a Sanskrit-Pali index. London: K. Paul, Trench, Trubner & Co.
  • Collins, Steven (2006). A Pali Grammar for Students. Silkworm Press.

External links

  • Pali Text Society, London. The Pali Text Society's Pali-English dictionary. Chipstead, 1921-1925.
  • Pali Text Society
  • Reconstruction of Ancient Indian sound clusters on the basis of Pali sounds (according to "Grammatik des Pali" by Achim Fahs)