Content created: 971226
File last modified: 170308

The Chinese Language(s)

An Overview for Beginners

Note: This essay should tell you more than you need or want to know about the Chinese language in general. For the pronunciation of Romanized Mandarin, see the "Pronunciation Guide" on this web site. (Link)


1. Dialects
2. Phonology
3. Morphology & Homonymy
3a. Excursus: Homonym Taboos
4. Writing
5. How Chinese Think of Characters
6. Modern Language Reforms.
7. Chinese Outside China
Appendix Chinese in Táiwān

Related files on this web site

Guide to Pronouncing Mandarin in Romanized Transcription (link)
More Than You Want To Know About Chinese Tone (Link)
More Than You Want To Know About Simplified Characters (link)

1. Dialects

Mandarin. The expression "Chinese language" designates a number of mutually unintelligible but historically related languages (groups of dialects) spoken by the Hàn people of China and by Hàn in overseas Chinese communities. Because Chinese governments are rather sensitive about the possibility of local autonomy, it is politically incorrect to describe any of these Hàn languages as independent languages (although they are that by any non-politicized definition), and accordingly they are conventionally called "dialects." Each contains very considerable internal dialectical variation. Most are named after the present or past names of the places where they are spoken.

The most widespread Chinese language, known as "Mandarin" or Guānhuà 官話, is spoken in north, central, and west China. The Mandarin dialect of Běijīng has for centuries been the language of government. Important non-Mandarin dialect groups (languages) are:

Overseas Chinese normally speak dialects of Hokkien (Mǐnnán) or Cantonese. Taken together, the groups of dialects constituting spoken Chinese are spoken by about 94% of the population of political China. The remaining 6% speak Tibetan, Mongol, Yao-Miao, Thai, Uigur, and other non-Hàn languages. In the far south, Hàn dialects seem (to me) to grade into northern Vietnamese dialects.

In the XXth century the dialect of Běijīng was chosen as the official national language, and succeeding Chinese governments promoted it (in very slightly different variants) under the names "National Language" Guóyǔ 國語 and "Common Speech" Pǔtōnghuà 普通話. In Singapore it is called "Chinese Speech" Huáyǔ 華語. This language is official throughout China and is what is usually taught under the name of "Chinese" in schools outside China, including schools serving Cantonese- or Hokkien-speaking people. (It is ironic that elderly Cantonese in Fiji, say, seek to keep their Chinese heritage alive by forcing the kids to study Mandarin, a language which the elderly people themselves do not speak, but never mind that. Nationalism is a remarkable thing.)

The English name "Mandarin" comes from the Portuguese. The speech of Běijīng, having been as close as the Imperial régime came to an official language, was therefore referred to as Guānhuà, or "officials' speech." (Guān means an official.) But the word guān never got borrowed into English. Instead, English borrowed the Portuguese translation (mandarim). (The Portuguese word in turn came apparently from the Malay mantrī or menteri, meaning "minister of state." And Malay had borrowed it from Sanskrit mantrin "counselor," ultimately derived from the Sanskrit root man, meaning "to think." Now you know.)

Since Chinese tend to name languages after places, and since English tends to use the same names for languages and the people who speak them, it is useful at this point to include a few additional terms you will find in the anthropological literature referring to regional variants of Chinese language or culture. The following are NOT major Chinese languages (except for Hakka), but DO cast a long shadow over the ethnographic record.

2. Phonology

The usual unit of phonological analysis in Chinese is the syllable, which is also the unit recognized by the writing system. Contemporary Chinese languages each have a closed set of syllables. In Mandarin, each syllable consists of one (or none) of 21 initial consonants, plus one of 39 finals (each composed of a vowel or vowel diphthong with or without a final -n or -ng), plus one of four tones (configurations of pitch height). In theory therefore there could be 3,276 different Mandarin syllables, although in fact many do not occur.

For more about the sounds of Mandarin Chinese, please see Guide to Pronouncing Mandarin in Romanized Transcription. For the truly masochistic, you may wish to examine the page called More Than You Want To Know About Chinese Tone.

3. Morphology & Homonymy

A large proportion of Chinese vocabulary is monosyllabic, and individual words, whether monosyllabic or polysyllabic, are uninflected, making use of word order to show syntactic relationships:

Idon'twantto eat[food].

Linguists never tire of pointing out that the majority of "words" in Chinese are polysyllabic (and hence are written with two or more characters). However, it remains the case that every syllable has a semantic field, even if the sum of two syllables may have a semantic field unrelated to the component parts. For example:

In some cases two synonyms can be used together to make a single "word" with a meaning that each of them has individually (the combination being referred to as a "synonym compound"). For example:

In spoken Chinese, only biānfú is colloquial, not biān or fú alone.

The combination of (1) a restricted inventory of phonological syllable types and (2) and syllable-level semanticity, logically tends to generate homonyms. The tendency of spoken language to favor bisyllabic compounds, even synonym compounds, has the effect of disambiguating those homonyms. For example, here are some other semantic fields associated with the spoken syllables biān and fú:

biān = edge; whip; compile; bat; to pierce with a stone probe

fú = prop up; prisoner; fall; not; thus; bat; happiness; float; fluorine; bushy

In isolation, the syllable biān is ambiguous because of this homonymy. The same thing is true of the syllable fú. But the combination biānfú can mean only "bat." (Similarly yīfú can mean only clothing, &c.)

Although we can say that in speech biānfú is the only "word" that colloquially means "bat," that is in fact a bit misleading: Biān or fú alone can suffice if it happens to enter into another bisyllabic compound. For example, we can combine either element with "wing," yì , to make biānyì 蝙翼 or fúyì 蝠翼, each of which means "bat wings."

3a. Excursus on Homonyms and Pop Culture

The fact that homonyms are so many, especially for expressions of a single syllable, gives birth to a wide range of polite customs and minor taboos based on them. For example:

A mother wants her child to be smart and industrious in school. On the festival of Wénchāng Dìjūn 文昌帝君, one of the gods of literature, a good Taiwanese mother makes sure that her child's lunch box includes celery because the word for celery is qín , which sounds exactly like the first syllable of "industrious" (qínláo 勤勞).

She should also include something containing green onion, since the word for green onion, cōng , sounds just like the first syllable of the word for "smart" (cōngmíng 聰明).

Now that garlic has entered the common diet in Taiwan, the truly modern mother sometimes also includes something seasoned with garlic (suàn ) to express her wish to improve her child's mastery of  arithmetic (suànshù 算術).

Just as there are some acts of hope suggested by homonyms, there are some taboos implied by them. For example:

Thousands of opportunities for such taboos are not exploited, and many that are normal in one region are unimportant in others. (Northern Chinese turn over fish all the time.)

How do we interpret all this? Such customs are condemned as "superstitious" by modernizers who think other people believe them literally. They in fact are better seen as minor acts of courtesy and gracious living available even to very humble people. Such customs are found all over the world. (In the English-speaking world one does not give knives as wedding presents lest one imply the newly united couple should be separated again, for example.)

What is of interest in the present context is the custom of exploiting a linguistic feature of Chinese itself, namely the high number of homonyms, to produce these small acts of human hope, showing how self-conscious speakers tend to be about homonyms.

4. Writing

As you know, Chinese is written by means of hieroglyphics, called "characters" in English, zì in Chinese. A small proportion of such characters actually derive from fairly literal pictures of things, such as the words for sun or tree . Others extend these "primitives" in a variety of ways. thus "east" is the sun behind a tree . In some cases "puns" are used. For example, "to manage" or "principle" lǐ is built on the homonymous term for "hamlet, residence" lǐ to which a ruler or king has been added. This last principle, by which a character is borrowed for its approximate sound and then modified slightly to show the change of meaning, produces most of the characters used in Chinese.

Ordinarily each separate sense of the same spoken syllable has a different character. For this reason, the written forms of the syllable biān (to continue our earlier example) are not ambiguous, since there are separate characters for "edge" "whip" , "compile" , &c. In writing, it is therefore not necessary to add a second syllable to show which sense of biān is intended, since that is already clear from the character selected. For this reason written Chinese is capable of being different from and more concise than spoken Chinese.

The picture above was painted about 1770 by a Japanese artist named Shuseki. It shows a beggar carrying a huge gourd and watching a bat. In the upper right hand corner, above the artist's signature and seal, is the following inscription in Literary Chinese (enlarged at left): 蝠自天來。

Literary written: 來。
Colloquial spoken: 蝙蝠 天上 下來。
A bat descends from heaven.

(The reason this is an appropriate inscription for a painting —indeed why the painting shows a bat in the first place— is homonymy again: The spoken syllable fú can mean bat , but it can also mean "happiness" , and as a result of this ambiguity a bat has from antiquity potentially been a symbol of happiness! You will find bats painted on teacups, embroidered on clothing, and featured in greeting cards, as well as lurking in old paintings. Bats are by no means the only puns that appear as decorative motifs. Art motifs based in linguistic puns are called "auspicious designs" —jíxiáng tú'àn 吉祥圖案— in Chinese. A number of them are discussed in a separate page of this web site. Link.)

This difference between spoken and written Chinese has important consequences:

First, the writing system is largely independent of the sounds, and can be used by speakers of all the dialect groups of China (and other countries), even though (or because) it exactly mirrors the spoken usage of no-one.

Second, there is no particularly good reason why written Chinese, with the clarity of its characters, need have the same syntax as spoken Chinese, in which polysyllabic compounds (and other devices) are critical to disambiguating homonyms. This tendency for written and spoken language to vary from each other culminates in the production of Literary Chinese. ("Classical Chinese" sometimes refers especially to the writings of the Zhōu dynasty [period 4]. The language from Hàn times [period 6] on is usually called "Literary Chinese.") Almost certainly, nobody was ever a native speaker of Literary Chinese. Indeed, the chances are that nobody even spoke Literary Chinese, or anyway not with a northern phonology like that of Mandarin (although educated people saturated their speech with Literary expressions). Nevertheless, Literary Chinese served as the written lingua franca of the Chinese and of adjacent peoples for two thousand years, and died abruptly in about 1920 in a vernacular language reform. We'll get back to that, but first some more implications of Chinese being written in characters.

A third implication is that learning to read and write requires mastery of a separate morphology and grammar, quite aside from the problem of memorizing several thousand characters, and literacy is therefore a painstaking and expensive business. The task of memorizing characters is simplified somewhat by the fact that the vast majority of them, as we noted, are composites, made up of a "radical" or meaning element (which may also function as a separate syllable on its own) and a phonetic element that usually suggests the sound. Even as lǐ, "manage," is made up of a phonetic lǐ and a meaning-linked radical, "king" , so in our painting fú meaning "bat" and fú meaning "happiness" both include the same right hand element (), which is also shared with fù , meaning "wealthy," fú meaning "crawl," fù , meaning "assistant," and various other fu-words (although there are fu-words with other fu-phonetics). The "bat" fú has as its left-hand element the radical chóng which refers to vermin, bugs, and small amphibians (until recently usually written when it is a separate word). The radical chóng therefore shows that the fú in question is the one that relates to some kind of buggish beast, while the right-half shows the pronunciation to be fu or something like it.

As most dictionaries have analyzed the language in the last couple of centuries, there have been 214 radicals and 888 phonetics, making a theoretical possibility of 190,032 compound characters, although not all combinations are in fact possible. The largest Chinese dictionary lists 49,905 characters, leaving the remaining hundred and forty thousand or so yet to be invented. The Unicode Consortium, keeper of the new international standard for computer representation of scripts in all languages, defined 20,901 characters in version 2.1, and promptly enlarged the total character set for "CJK" (Chinese, Japanese, and Korean) to 27,482 in version 3, published in 2000. It is sometimes claimed that the average person today probably needs knowledge of about 3000 characters to be reasonably literate, but I have noticed that computer fonts of fewer than 12,000 or so characters turn out to cramp one's style.

A fourth implication of such a writing system is that a written text in a compact, literary style that takes advantage of the features of the writing system that make it concise cannot necessarily be understood if read out loud. It must be "translated" into spoken language. (However, some dialect groups, such as Cantonese, have more elaborate inventories of syllabic types, and therefore more possible syllables and less homonymy. Thus reading out a Classical text in Cantonese does not render it as unintelligible as it does in Mandarin.)

A fifth implication of such a writing system is that it is difficult to learn, and that doubtless contributed to literacy being confined to a small proportion of the population until the XXth century. Unfortunately, we do not really know how difficult it is to learn because there are not good cross-linguistic measures of such things. My impression is that for a native speaker learning to read and write in Chinese takes about as much school time as learning to read and write in English (another complex writing system that is difficult to learn). We do not really know how widespread literacy was in China in various periods (although both the conventional view of the "illiterate millions" and the conventional view of China as a "literate nation" are probably misleading). And we do not know whether the difficulty of attaining literacy in Chinese may be compensated by, for example, the possibility that speed reading may be much more efficient in Chinese than in alphabetical languages. (Click here for more about literacy in dynastic China.)

5. How Chinese Think of Characters

Traditional Chinese thought has classified Chinese characters into six types (called the liùshū 六書). Although the discussion so far has suggested the most important processes, here is how the picture looks when the traditional categories are used:

Xiéshēng 諧聲 or xíngshēng 形聲, sometimes translated "phonetic symbols" are the most common type. The term refers to compounds of a phonetic and radical. For example, the radical nǚ "woman" plus the phonetic mǎ (which by itself would mean "horse") combine to make mā , or "mother" (in some regions "grandmother").
Xiàngxíng 象形, or "pictographs" are a small group of characters that actually resemble the objects they name, such as "field" tián , "mountain" shān , or "sun" rì .
Zhǐshì 指事, sometimes translated "picture of action," represent abstract ideas by trying to picture them: sān "three," zhōng "middle," "large" (person with arms spread out), and so on.
Hùiyì 會意, sometimes translated "ideographs," are compound characters in which both elements have a semantic connection, for example: "sun" rì + "moon" yuè = "bright" míng ; "woman" nǚ + "son" zǐ = "good" hǎo .
Zhǔanzhù 轉注, sometimes translated "figurative extension" [of meaning] refers to characters formed by modifying the shape of a character to produce another one of a related meaning. Thus it is said that "corpse" shī is derived from "person" rén .
Jiǎjiè 假借 refers to characters borrowed from others of similar pronunciation. Thus wàn originally referred to the scorpion, but was borrowed by the homonymous word meaning ten thousand which is often simplified to , apparently originally a surname.

6. Modern Language Reforms

As we saw when we discussed the writing system, it is possible for literary Chinese to be quite different in syntax from spoken Chinese, and a result of this was the growth of a literate tradition that required considerable education to manipulate it. The early XXth century brought winds of democratic sentiment to China, as elsewhere, and saw a host of changes in the form of written Chinese. These related to written style, to attempts to standardize pronunciation of a standard national speech, and changes and standardization in the written characters.

Written Style. The Vernacular Revolution in the early years of the XXth century rebelled against the evolved literary style in favor of using the characters to represent the language more as it was spoken (particularly in northern China). With the founding of the Republic in 1912 came the foundation of public schools and of formal language policy. By 1930 Pekinese (Běijīnghuà 北京話) had been selected as the language of the Chinese republic, to operate beside or in place of all other kinds of Chinese. Since Běijīng is itself home to a good deal of speech variation, a standardized variety was selected, standardized writings were officialized for characters that had variant writings, and efforts were made to minimize the number of characters casually used as "miswritings" of other characters. (For example, the character dàng had been so often miswritten for tàng as to become customary.) Some of these policy changes were more widely followed than others, but they influenced school usage, which had an important long-term effect.

Post-imperial Chinese governments (Nationalist and Communist), having officialized the speech of Běijīng as the national language, then sought to claim it was the speech of all China by naming it "the National Language" (Guóyǔ 國語) (Nationalists) or "Common Speech" (Pǔtōnghuà 普通話) (Communists). In English the name "Mandarin" is standard. The modern written language is intended to reflect this spoken standard, and only this standard.

The change in written standard to reflect northern speech patterns rather than the old, non-colloquial style of literary Chinese is referred to as colloquial or "clear" writing (báihuà 白話). It is, of course, almost as foreign to speakers of Non-Mandarin variants of Chinese as the old literary standard was, for the languages of China vary not only in pronunciation, but in vocabulary and word order (a fact rarely appreciated by northern Chinese or by Chinese who have been the victims of school system propaganda). If one used characters honestly to reflect the colloquial vocabulary and sometimes word order of different regions, one would end up, not with a unified "Common Speech," but with several quite different written colloquial standards. Here are three non-Mandarin colloquial writings compared with colloquial written Mandarin. [Footnote]

English: They didn't speak Mandarin with you.

Chinese word order: They not with you speak Mandarin.

Mandarin (國語): 他們沒有跟你說國語。

Wú (吳語): 伊拉嘸沒脫儂講國語。

Southern Mǐn (閩南): ㄧㄣ無甲汝講國語。

Cantonese (廣東話): 佢哋冇同你講國語。

Footnote: These examples are derived in part from an article by Edward Gunn (1993 "Rewriting Chinese: Style and Innovation in Twentieth-Century Chinese Prose." Arts & Sciences Newsletter 14(2): 4-5). For the sake of simplicity, these examples are entirely in traditional (non-simplified) characters. The use of simplified characters would create minor variants of what you see here. These writings accentuate the differences by making use of characters that are locally used, but in some cases are not part of standard Chinese. Sometimes they are not found in printing and computer type fonts. For example, the symbols ㄧㄣ in the Southern Mǐn line are a stand-in here for a character which has no Unicode representation. The character is relatively common in Táiwān to represent the sound of Hokkien in (Mandarin yīn), which locally means "they." It is usually written with a person radical followed by a same-sound phonetic .

Official language policy in Táiwān has adamantly opposed developing non-Mandarin writing systems until quite recently, so Taiwanese Hokkien has no standard character set. Mainland policy even more strongly opposes the recognition of non-Mandarin Chinese. And Táiwān has poor representation in international standards bodies, because of its marginal diplomatic status. Thus it is not surprising that the Unicode Consortium has left out Hokkien characters. As the world moves towards "completing" its computer standardization, Hokkien is being quietly left off the passenger list. Fascinating, what? For more on this, see my 2002 article "Languages Left Behind: Keeping Taiwanese Off the World Wide Web." in Language Problems & Language Planning 26(2): 111-128. (The article is available on this web site. Link.)

In other words, the effect of the "colloquial" revolution in writing has been, for a large minority of the population, not a switch from "literary" to "colloquial," but from one literary standard to another, the only difference being that the new one corresponds with somebody else's speech. (It is possible to overdraw the "foreignness" of the new written standard, of course. The same school system that teaches young Cantonese, say, to write in Pekinese is also teaching them to speak Pekinese, and indeed the writing system is presented as having only national language pronunciations.)

It would in principle be possible to develop a vernacular literature in Cantonese or Wú just as it has been developed in Mandarin. The reason this has not occurred is that it has been very strongly discouraged by the government because it is seen is subverting the national standardization and as potentially treasonous. (The exception has been Hong Kong, where Chinese was not an official language until recently, and government policy ignored it. [Footnote]) Furthermore, the development of colloquial writing cannot flower without colloquial literacy in these non-Mandarin languages, and this is scarcely possible without cooperation from the centrally controlled school system. The above examples in fact include several characters unknown to most Chinese even in the regions whose speech the examples reflect.

Footnote: The end of martial law in Táiwān has also seen the beginnings of a movement there to develop a Southern Min written colloquial, but little standardization has been accomplished so far. Although the population of Táiwān is generally bilingual in both Mandarin and Hokkien and switches between them with ease, few people see much advantage in undercutting the priority of written Mandarin, which is an entrée into the world of Greater China, while written Hokkien would be useful only locally, and only marginally useful at that.

Alphabets. At the same time, a system of Roman spelling was created (the "National Romanization," Guóyǔ Luómǎzì 國語羅馬字, or "Gwoyeu Romatzyh," as it spelled itself), and a non-Latin phonemic alphabet, the "National Phonetic Alphabet" (NPA) (Chinese: Zhùyīn Fúhào 注音符號), to promote standard pronunciation. (The reason for a non-Latin alphabet was that it could fit the language better, have shorter spellings, produce a more rational alphabetical ordering, and be written either horizontally or vertically. Furthermore, it would be a Chinese rather than a foreign invention. As a sample of how it looks, here is how it begins: ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏ …) Although a few people tried to substitute these alphabetic systems for characters, the Romanization system was never widely accepted, and the National Phonetic Alphabet became popular only to indicate sounds in dictionaries and textbooks, never as a writing system in its own right. The NPA is still used in Táiwān today in dictionaries and textbooks, and beside the characters in books intended for children. On the mainland it sometimes continues to be used for head entries in many dictionaries, beside the Pīnyīn spellings.

In 1956 the new Communist government began an even more intense campaign to promote the national language (under the name "Common Speech") on the mainland. The NPA was replaced for most purposes with a new Romanization system, called Pīnyīn in English, Hànyǔ Pīnyīn 漢語拼音 ("Hàn Language Phonetic Alphabet") in Chinese, eliminating the distinction between an alphabet for internal use and a separate, Roman, one for international use. By the 1980s Mainland publishers used Hanyu Pinyin as the official Romanized spellings in all foreign language text intended for countries using the Latin alphabet. In the absence of Chinese support for older systems, international news agencies were forced to abandon older spellings at last, and a reasonable standard was achieved.

It is unclear to what extent, if at all, there was any unitary or official expectation that the Romanized spellings would evolve into a common writing system. Although standards do exist for punctuation, word division, capitalization, and so on, few people seem to understand or practice them. For example the Romanized spelling of the characters for "red wine" should be hóng (red) pútáo (grape) jiǔ (wine) 紅葡萄酒. But on bottle labels intended for export, the words are typically spelled hóngpú táojiǔ, suggesting little appreciation for the space between written syllables as a marker of word boundaries. Visitors to China are often amused to see both Pinyin (or English) written with no spaces between words or with spaces seemingly randomly inserted into the line of letters. It appears that Romanized Chinese outside of schoolbooks has little popular support for use as normal writing.

Simplified Characters. At the same time, many characters were officially simplified to enable their writing with fewer strokes. For the most part, this involved the officialization of abbreviated script forms that had been in use for centuries, rather than the most elaborate script forms, which had been preferred by the initial language planners three decades earlier. Thus, for example, and are two alternate writings for the syllable Tái (as in Táiwān). Both have been used more or less interchangeably for centuries. But the simpler one is now official on the mainland, while the more complex (and formerly more formal) one continues to be official in Táiwān (where the simpler one is nevertheless more common).

In some cases, homonyms were merged. So fēng "graceful, refined" was pressed into service to mean also "abundant, prolific," replacing fēng . In some cases, entirely new characters were created. Thus huá "flowery; China" was replaced by a new character made of huà "transform" (borrowed for its similar sound) placed over a cross (recalling the bottom of the original ). Some frequently occurring radicals or phonetics were simplified in all characters in which they occurred. (On this web site, traditional characters are blue and simplified characters are red. Characters identical in both character sets generally match whichever style predominates on the page.)

Here are a few examples of simplified characters:


(Click here for a separate page containing More Than You Want To Know About Simplified Characters.)

The process of officializing simpler characters to replace older, more complex ones is not the only force introducing new characters into Chinese. Modern language planning has involved creating new characters for newly discovered chemical elements, for example. And ordinary people have felt free to create characters when those they knew seemed inadequate to the concept to be expressed. (Click here for an extended modern example of commercial character creation.)

But perhaps my favorite example of character creation involves the artificial westernizing of the third-person pronoun. Like other kinds of Chinese, spoken Mandarin has only one third-person personal pronoun (tā). It was traditionally written , a character with a person as the radical on the left side. Apparently under the pressure of Western languages, which usually distinguish "he," "she," and "it," Chinese in recent decades have created a new character for tā in the sense of "she" which replaces the sexually indeterminate person with a woman: , gradually restricting to mean specifically "he." Another character, originally an alterantive writing for , has been pressed into service for tā in the sense of "it": .

But then, on the model of and , other characters have come into common use for tā when it refers to an animal or a god . To the best of my knowledge all of these (except itself) are XXth century creations. The fact that they are in computer fonts suggests how common they have become.

Footnote: Early in their history Chinese characters underwent a rotation in direction, so in fact this sectarian use restores "mother" to her upright position. The glyph is a derived picture of a stick-figure body and two breasts with nipples.

Similarly, some sectarians who worship a mother goddess have created an underground glyph made by turning one of the conventional symbols for "mother" on its side to produce graphic. [Footnote] This is used in sectarian tracts and as a decorative motif to refer to the mother goddess. In recent decades it has been possible to get arrested for writing the character this way. (Don't show this page to a late Imperial or modern Communist official! And don't ask how it got into my computer.)

(Then there are the obscene characters invented to be scribbled by rude schoolboys on the walls of public johns. An oldish one of these is made by placing the radical "enter" atop element "meat" to produce an obscene graph referring to sexual intercourse . It is somewhat shocking to discover this in the Unicode Consortium's code list; it was certainly not in the original Chinese national standards. The word is pronounced cào, in case you feel like being a rude schoolboy.)

The point is that throughout Chinese history both the precious and the rambunctious have been creating new characters to differentiate subtle variations of the same spoken syllable into bona fide homonyms, and linguistic democrats have been equally industriously lumping homonyms under the same character as different senses of the same "word." The recent reforms are grander in scale, but do not represent anything new in conception. (It is still early to say what the introduction of computers will do to this. On the one hand computer type-fonts and internal storage codes show every sign of slavishly following the official character set. On the other hand it is easy enough for even a casual user to design new type fonts for computers that anyone can easily create convincing looking additions.)

From 1956 until the Cultural Revolution (1967-1978) only a few of the newly officialized simplified characters were widely used, partly due to the difficulty of making complete new type fonts. During the Cultural Revolution, most printing establishments were closed down or limited to publishing the writings of Máo Zédōng, and when publishing was resumed afterward, all surviving publishers seemed "somehow" to have acquired type fonts for the necessary new standard characters. The new standard is universal in China at this point, except for book titles or "fancy" public signs, although it has been less successful among overseas Chinese. (Its use is, predictably, discouraged in Táiwān.)

The Romanized spellings, in contrast, have not found much popularity. I have never met any Chinese able or willing to use them as a continuous orthography following the official standard, and public signs, often misspelled, seem to use them to show how elegantly foreign the establishment is rather than to communicate anything.

7. Chinese Outside China

One result of Chinese prestige through the centuries has been that neighboring countries have borrowed Chinese words and the Chinese writing system.

Until modern times Chinese was the universal written language of East Asia. Today it is still used in cultural China, including the overseas Chinese communities, and it is being spread into the western areas of political China, notably Qīnghǎi, Xīnjiāng, and Tibet (Xīzàng). And of course, it is widely studied by non-Chinese all over the world.

Appendix on Chinese in Táiwān

Because a good deal of Táiwān material necessarily enters into any course on Chinese anthropology, and because Táiwān is my own research area, which brings even more Táiwān material into my courses, it is well to say something about language in Táiwān.

Taiwanese Hokkien. The Mǐnnán or Hokkien language consists of a family of Chinese dialects spoken in southern Fújiàn province and among Diaspora communities of Fújiàn origin. The majority of the population of Táiwān speaks Hokkien. Indeed, already by the early XIXth century 80% of all Chinese in Táiwān were from Fújiàn, and native Hokkien speakers still constituted about 70% of the modern Táiwān population in the second half of the XXth century.

Because Hokkien speakers in Táiwān have origins in a number of different dialect areas of southern Fújiàn, Taiwanese Hokkien reflects both dialectical differences found in Fújiàn and dialectical differentiation developed in Táiwān itself during the three to four centuries that Hokkien speakers have lived there.

Some vocabulary was borrowed from Japanese during the half-century of Japanese administration of Táiwān, 1895-1945, but these borrowings are gradually drifting into disuse, often being replaced with new borrowings from Mandarin (which itself sometimes borrows from Japanese).

Hakka in Táiwān. Before World War II the other principal language of Táiwān was Hakka, which accounted for about 13% of the early XIXth century Chinese population, about 10% of the modern population.

Mandarin in Táiwān. After the War there was a substantial influx of Mandarin speakers, who are now about 10% of the whole population, added to another 4% or so pre-War mainland immigrants, many from Mandarin speaking areas. This, together with the official status of Mandarin, has made Mandarin the third principal language.

Because Mandarin is the only language used in schools (and the only kind of modern Chinese speech that matches the "vernacular" writing system well), nearly all of the modern population is now able to use it with fair to excellent fluency. And because Hokkien is the native language of the vast majority of the population, most people are also able to use Hokkien. Accordingly code-switching and diglossia are common.

Cantonese in Táiwān. Dialects of Cantonese are widely spoken in overseas Chinese communities, including most Chinese communities in North America, but Cantonese is rarely found in Táiwān. Indeed, in Táiwān the name "Cantonese" is normally misunderstood to refer to Hakka.

Written Chinese in Táiwān. Traditionally, most written Chinese tended to be adialectical, a phenomenon made possible by the hieroglyphic nature of the writing system, as we noted, But literary tastes varied, and some writers more closely followed dialectical usages of one area or another, while others remained closer to a strictly written standard. And of course localisms were desirable when the language was being used to represent speech, as in the text of a play.

Because modern policy requires use of modern Mandarin and mandarinized "vernacular" writing, and because Literary Chinese was in use in Táiwān and Fújiàn before that, Hokkien has no widely known written form. (Cantonese, in contrast, developed a modest colloquial literature of its own based especially on the spoken standards of Hong Kong and Canton.) In Táiwān today it is normal to write in colloquial, public-school Mandarin, which is either pronounced in Mandarin or translated on the spot if Hokkien output is necessary.

Hànwén. In Táiwān a much valued older skill, one not taught in schools is the ability to use Hànwén 漢文. Literally, Hànwén means simply "written Chinese," which is also the definition provided by most dictionaries (including Hokkien dictionaries) that I have consulted.

The same term is used in Japanese (pronounced kanbun) to refer to Chinese characters as pronounced in Japanese. Similarly (and probably by borrowing from Japanese usage) when my Táiwān informants use the term, it refers specifically to pronouncing literary texts using Hokkien rather than Mandarin pronunciations of the characters. Literary Chinese, pronounced in local ways —in other words, Hànwén— was the norm of Chinese literacy throughout China in all eras before our own, and it was the only sort of Chinese literacy sought in Táiwān before 1945. It was Taiwanese use of Hànwén which the Japanese endeavored to suppress during their occupation as representing dangerous Chinese nationalism.

Hànwén still enjoys a considerable prestige among both the elderly and (perhaps increasingly) among tradition-oriented youths (and some other people who have noticed that it is extremely cool). It is still used to read out loud classical Chinese texts (especially poetry) and melds into Guānhuà in religious contexts.

Guānhuà in Táiwān. The English word "Mandarin" translates the Chinese expression Guānhuà 官話, literally "Officials' Language." We met this term earlier as the speech of Běijīng in late dynastic times, that served for communication among officials throughout the empire. (Caution: Even as in English "Mandarin" refers both to this formal style and to north Chinese dialects in general. So, in Chinese, do the terms Guānhuà and Guānyǔ.)

Although Imperial officials did speak something to each other (probably a variety of things in fact), and although it was referred to as Guānhuà, it probably was not quite the same as what goes by that name in Táiwān today. Modern Taiwanese claim to use Guānhuà in some religious rituals. Taiwanese informants would have one believe that Guānhuà, like Coptic or Latin, is a dead language preserved only liturgically and perhaps in historical movies.

However, Taiwanese Guānhuà technically differs both from Hànwén and from standard Mandarin. My impression is that most Guānhuà in liturgical use today is in fact derived, perhaps more or less on the fly, from other forms of Chinese in modern use in Táiwān and is not an accurately preserved earlier speech form. It seems to me that it generally follows Literary Chinese syntax and vocabulary, Hokkien phonology, and Mandarin(ized) Hokkien readings for characters, in other words, it is what linguists call a "back formation."

"Faking It" With Back Formations. The process of back formation occurs when a speaker creates (often incorrectly) a would-be "prior" form on the basis of a form which he already knows. Back formations are a common way of producing pseudo-Guānhùa from Hànwén or modern Mandarin; Hànwén pronunciations are also used to produce pseudo-Mandarin by speakers who do not know Mandarin well but who wish to impress others or who need to communicate with monolingual Mandarin speakers. Similarly, Mandarin is used to produce pseudo-Hànwén by those who do not know Hànwén but wish they did.

In my own research, a fascinating sphere for such linguistic invention is the speech of spirit mediums in trance. Today few mediums are monolingual any more. However formerly some monolingual, Hokkien-speaking spirit mediums possessed by mainland spirits —which includes most of the Táiwān popular pantheon— affected to speak "Mandarin" when in trance by creating expressions based partly on Hànwén and partly on garbled local Mandarin usage. The goal, of course, was to sound appropriately mainlandish when channeling a mainland divinity.

Similarly, younger mediums with a Mandarin education but little understanding of Hànwén or Guānhuà sometimes attempt to make the divinities possessing them sound like classical Confucian gentlemen by portmanteauing into their Hokkien revelations a diversity of modern Mandarinisms. The effect is something like that of mock King James English used (often with mistakes) in modern English prayers or religious tracts. Modern Mandarin is thus used as a model for reconstructing "archaic" Hokkien or Guānhuà, at the same time that Hànwén and possibly Guānhuà provide models for the creation of mock-Mandarin. The goal, either way, is to be highfalutin.

For the spirit medium, such divergences from his normal, non-trance, conversational style are understood to be part of the evidence that it is another, and nobler, personality who speaks through him when he is in trance.

These stylistic tendencies are inherent in the present linguistic and educational milieu of Táiwān and are by no means limited to spirit mediums. Other traditionalists are also concerned with how people spoke in the past or how gods speak today, and they engage in the same experiments. An interesting sectarian example occurs in my notes, where one of my informants reported a dream in which she was visited by her mother of an earlier incarnation, set early in the XXth century.

In her dream she addressed her mother in Hokkien as bú-chhin. She proudly explained to me: "Nobody speaks that way anymore. Bú-chhin is very old speech." She stressed the putative archaism of the expression as one proof of the authenticity of the visitation and of the continuity of her present personality with her earlier incarnation.

In fact it is not clear that the Hokkien expression bú-chhin was ever a colloquial term of address in Hokkien during the period of her three incarnations. It appears instead to be a Hànwén, reading of the two characters that make up the Mandarin term of reference (mǔqīn 母親). The intended stylistic impact is to underline the point that she has reverted to a much earlier identity, complete with "obsolete" speech mannerisms.

Enough! All this is surely a lot more than you wanted to know, especially about Hokkien and Táiwān That's what you get by visiting the web site of a professor who is a language screwball. Think of it as fascinating.

