Persian

Persian

فارسی
53M speakers · Indo-European Indo-Iranian · Arabic
On the Map

At a Glance

For more than a thousand years Persian was the prestige language of a vast cultural sphere stretching from Anatolia to Bengal and into parts of China. It was the court language of the Mughal Empire in India for centuries. It served as a major literary and administrative language of the Ottoman Empire. It was the lingua franca of poetry, scholarship, and statecraft across what historians call the Persianate world. The poetic canon — Ferdowsi's Shahnameh, Rumi, Hafez, Saʿdi, Khayyam — is one of the great literatures of the world, and was read across this entire region for centuries. Persian no longer has that geographic reach, but the prestige and the literature endure.

Persian is the largest member of the Iranian branch of the Indo-European family, alongside sister languages like Pashto, Kurdish, Balochi, and Tajik. Specifically it sits in the Southwestern subgroup, with Luri and Bakhtiari as close cousins. Modern Persian is roughly 1,400 years old as a continuous written tradition. The earliest surviving poetry and prose date to the 9th and 10th centuries CE and remain readable to literate speakers today, in a way that Old English does not for English speakers. Before that came Middle Persian (Pahlavi), written in a script descended from Aramaic and used by the Sassanid Empire. And before that, Old Persian, the language of the Achaemenid royal inscriptions, written in a semi-alphabetic cuneiform from around the 6th–5th centuries BCE.

Today Persian is the official language of Iran, where it is the first language of more than half the country's roughly 80 million inhabitants. Tens of millions more speak it in Afghanistan, Tajikistan, and a global diaspora. Adult literacy in Iran is around 93%. As Yousef puts it in his comprehensive grammar, "Modern Persian has been very simplified. No gender, and no declension of nouns and adjectives" — a striking fact for a language whose distant Indo-European cousins (Russian, German, Latin, Sanskrit) are famous for case systems and gender. Persian shed almost all of that. What it kept, and what it built on top, is the subject of the rest of this page.

Varieties

Persian is a pluricentric language. It has three national standards, each with its own name, in three countries. In Iran it is called Fārsi (فارسی). In Afghanistan it is called Dari (دری). In Tajikistan it is called Tajiki (тоҷикӣ). The naming reflects political history more than linguistic distance: Yousef compares the situation to German, which Germans call Deutsch. In academic writing all three are usually grouped under the single name "Persian," and classical literature is shared across all three with practically no difference. Formal written Persian is largely uniform. Where the three diverge most is in everyday vocabulary, pronunciation, and — most visibly — script. Iran and Afghanistan use the Perso-Arabic alphabet. Tajikistan, under Soviet language policy in the 20th century, switched to Cyrillic, which it still uses. ISO 639-3 splits the three as separate codes (pes for Western Persian / Iranian Farsi, prs for Dari, tgk for Tajik); this page focuses on Western Persian.

Within Iran itself, the headline sociolinguistic fact is that one accent has effectively become the national standard. Tehran's pronunciation, vocabulary, and colloquial patterns now serve as the de facto spoken norm across the country, propagated by mass media, the education system, and migration to the capital. Yousef states the trend bluntly: Tehrani "is not only understood all over Iran — and beyond — thanks to the media, but threatens to assimilate all local vernaculars in the course of time." Windfuhr, writing in 1979, called Tehrani "the socially most prestigious dialect" and "fast becoming, the standard dialect of Iran." By 2026 that process is largely complete in formal and media contexts. Regional accents — in Mashhad, Isfahan, Shiraz, Yazd, Kerman, the Caspian coast, the Gulf coast — still exist in everyday speech and remain easily recognizable, but they are receding, and linguists have launched documentation efforts to record them before they disappear. Iran also hosts substantial speech communities in non-Iranian languages, the largest being Azerbaijani; Kurdish, Arabic, Baluchi, Turkmen, and Armenian are also widely spoken in their respective regions.

The split that matters most in daily life is not regional but stylistic. There are essentially two registers running side by side. Formal/written Persian is the language of broadcast news, lectures, public speaking, textbooks, government documents, and most printed material. Colloquial Tehrani is the language of friends, family, and ordinary conversation. The two differ in predictable ways — vowels collapse (xāne "house" becomes xune in speech), verb endings reduce (mi-rav-ad "he goes" becomes mi-re), and certain words are simply different — but the same speaker shifts fluidly between them. Reference grammars handle this in different ways: Lazard and Windfuhr default to literary forms, while Mahootian and Yousef explicitly describe Tehrani-colloquial speech.

Layered on top of register is ta'ārof (تعارف), one of Persian's most distinctive sociolinguistic features. Ta'ārof is the umbrella term for the elaborate system of polite ritualized exchange that pervades Iranian social interaction. As Yousef describes it, ta'ārof covers "the whole range of social behaviors meant to show courtesy and good manners, most importantly through deference, using words and idioms that have become cliché and should not be taken literally or seriously." Linguistically it is encoded across multiple layers. Pronouns shift: to is the informal/intimate "you" (singular), while šomā (originally 2PL) doubles as the polite "you" with 2PL verb agreement. The polite third-person pronoun ishān is literally 3PL, used for an absent person you respect, also with 3PL agreement. Honorific verb pairs cover certain everyday acts: farmudan is used when the addressee speaks or does something ("Did you say something?"), while arz kardan is used self-deprecatingly when speaking of one's own action. For "I," speakers use the humble bande ("servant") or haqir ("lowly"); for "you," sarkār ("overseer"), jenāb-e ʿāli ("Your Excellency"), or hazrat-e ʿāli ("Your Eminence"). Yousef notes wryly that the last two "are not very serious; you can use them for any person to show high respect." The younger generation observes these formalities less rigorously, but ta'ārof is still very much a live system.

How it works

Persian's basic word order is subject–object–verb. The verb sits at the end of the clause: man fārsi harf mi-zanam means "I speak Persian," literally "I Persian word durative-strike." That is typologically unusual on its own. What makes Persian unusual among SOV languages is that it pairs verb-final clauses with prepositions rather than postpositions, and with noun-initial noun phrases — the noun comes first and its modifiers (adjectives, possessors, relative clauses) follow. So the clause is head-final, but the noun phrase and the prepositional phrase are head-initial. This mixed pattern is rare across the world's languages and gives Persian a distinctive feel: long noun phrases unfold leftward in the sentence with prepositions out in front, but the verb is always waiting at the end.

Persian's most famous grammatical particle is the ezafe (اضافه). It is a tiny unstressed -e (or -ye after a vowel) that links a noun to whatever modifies it: ketāb-e qermez "red book" (literally "book-EZ red"), māshin-e Ali "Ali's car" (literally "car-EZ Ali"), the noun followed by an adjective, possessor, or another noun. The ezafe chains. In ketāb-e qermez-e man "my red book," it appears twice. It does enormous syntactic work — it is how Persian builds essentially every modified noun phrase — and yet it is almost never written in the Perso-Arabic script. Readers infer it. This is why Persian text looks deceptively spare on the page: a great deal of the grammar is unwritten and supplied by the reader.

Persian has no grammatical gender. Words and pronouns make no distinction between "he," "she," and "it" — all are u (or, in colloquial speech, un). It has no morphological case marking on nouns or adjectives. There is one major exception: the postposition rā (را), which marks specific direct objects. ketāb mi-xānam means "I am reading a book" (any book, indefinite). ketāb-rā mi-xānam means "I am reading the book" (a specific, identifiable one). In colloquial speech rā shrinks to -ro after vowels and -o after consonants. This is what linguists call differential object marking, and it is one of the features that most distinguishes Persian from the Indo-European languages most readers will know.

Verbs are where Persian concentrates its morphology. Every verb has two stems — a present stem and a past stem — that must be memorized separately. From māndan "to stay" you get the present stem mān- and the past stem mānd-; from kardan "to do" you get kon- and kard-; from raftan "to go" you get rav- and raft-. Onto these stems Persian glues a paradigm of person-and-number endings (six cells, one for each combination of 1/2/3 person and singular/plural) plus a small set of prefixes that mark aspect and mood: mi- for the imperfective (habitual + progressive), be- for the subjunctive and imperative, and na- for negation. Because the person-and-number ending already identifies who is acting, subject pronouns are usually dropped — Persian is a pro-drop language. Modern colloquial Persian has no dedicated future tense; the present is used for future meaning, with context or an adverb supplying the timing. A literary future with the auxiliary xāstan exists but is reserved for formal writing.

Compound verbs are extraordinarily productive. Persian forms most of its everyday verb meanings by combining a noun or adjective with a small set of light verbs — most often kardan ("to do/make"), šodan ("to become"), zadan ("to strike"), and dādan ("to give"). harf zadan "to talk" is literally "word-strike." kār kardan "to work" is literally "work-do." Mahootian estimates that compound verbs vastly outnumber simple verbs in the modern lexicon. New compounds are coined freely, often with English or French borrowings: telefon kardan "to phone."

The writing system is the biggest learning gate for outside readers. Persian uses a Perso-Arabic abjad with 32 letters: the Arabic 28 plus four letters Persian added for sounds Arabic does not have — پ p, چ č, ژ ž, گ g. It is written right to left. Letters connect to each other in cursive form, but seven letters do not connect forward (ا د ذ ر ز ژ و) — they end the cursive run and the next letter starts a new one. Each letter takes up to four positional shapes — initial, medial, final, and isolated — depending on what it neighbors. Short vowels (a, e, o) are normally not written at all in middle position; readers infer them. Long vowels (ā, i, u) are written, sometimes with the same letters used for related consonants. The most jarring feature for newcomers is what Persian inherited from Arabic: many Arabic loanwords keep their original Arabic spelling even when several different letters now collapse to the same sound. س, ث, and ص are three different letters that all sound /s/. ز, ذ, ض, and ظ are four different letters that all sound /z/. ت and ط both sound /t/. ق and غ both sound /ɢ/ (a kind of voiced uvular). Native Persian words use one fixed spelling each; Arabic loans preserve their etymological one. Numerals use Eastern Arabic-Indic digits ۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹, which look slightly different from the Arabic-Indic digits used for Arabic itself, and are written left-to-right inside the right-to-left text.

Explore

On the Map

Official in 3 countries

IranAfghanistanTajikistan

876,418 km²

Asia
View on map →

Typology

Word order SOV
Adjective after noun
Numeral before noun
Adposition preposition
Def. article no
Indef. article yes
Pro-drop yes
Tonal no
Gender no
Noun cases no

Related Languages

enzhesfrpt