Which reads faster, Chinese or English?(persquaremile.com) |
Which reads faster, Chinese or English?(persquaremile.com) |
The late John DeFrancis, who through his innovative textbooks was the first teacher of a whole generation of Americans who succeeded in acquiring Chinese as a second language, was a co-founder of the Journal of the Chinese Language Teachers Association, and author of a fascinating article titled "Why Johnny Can't Read Chinese." The Chinese writing system (no matter which form of the spoken language, ancient or modern, it is applied to) is full of ambiguities and other partially cued information that slows down reading--as is every other writing system in the world. By dint of much practice, I can read Chinese comfortably for information on a variety of subjects. By test, I was one of the most proficient readers of Chinese among second-language learners who participated in the norming rounds for a Test of Chinese as a Second Language in the mid-1980s (which I think was never rolled out into regular use, perhaps because it showed that most learners learned more Chinese from overseas residence than from taking university courses in Chinese).
Hacker News readers who would like to learn about English, Chinese, or other writing systems would be well advised to read the specialized articles in The World's Writing Systems
http://www.amazon.com/Worlds-Writing-Systems-Peter-Daniels/d...
edited by Peter T. Daniels and William Bright. The article on Chinese is very good, and the overview articles that discuss general features of writing systems are also very good.
>> most learners learned more Chinese from overseas residence than from taking university courses in Chinese
No surprise there. It is hard to learn a foreign language to a high level of fluency without constant background exposure.So the fact that "most learners learned more Chinese from overseas residence" is a surprise.
Who wrote the chapter on Chinese?
http://www.oup.com/us/catalog/general/subject/Linguistics/?v...
I'm pretty sure it's by William G. Boaltz
http://depts.washington.edu/asianll/people/faculty/boltzwm.h...
whose papers about ancient Chinese are full of surprising details, with insight into how the current writing system developed historically.
Any other sources that you could recommend by chance?
This is an excellent, authoritative resource on the subject.
Given his "dragon" example, A literate native speaker of English, who is familiar with the word "dragon" would not read "d. r. a. g. o. n.", then put them together, but would see "dragon" as an atom.
For instance, I see your nick and I can remember you from TechReport many years ago. But if it was in Chinese the match would be faster, and especially so if it appeared surrounded by other text (alphabet-based text looks more self-similar).
The example he gave is not a good one. I instantly recognized both words because the Chinese word for dragon is a single character and in English, words are separated by spaces.
Often times, words in Chinese are made up of 2 or 3 characters, or even 4.
Take for example this word [set?]
可口可樂
This means Coca-Cola in Chinese. In an English sentence, Coca-Cola is unmistakable as it's a word surrounded by spaces. In Chinese, we don't use spaces, we must decide for ourselves when words start and words end.
Furthermore, 可口 means thirsty. So it wouldn't be until I got to the third character that I would know I'm not reading thirsty (though more realistically context should have defined it for me already).
I find Chinese sentences involve much fewer syllables than English though, so perhaps there is some merit to that.
It would be interesting to see if there are any significant differences in the max WPM (words per minute) between English and Chinese readers under this context where information density is no longer a function of space.
I am not saying which is good or which is bad, but that's the style! Whether Chinese or English can read faster? If we put the language style aside, obviously Chinese. That's because Chinese writing system has much higher entropy and thus more information per square inch. However, writing style matters, a lot.
1) At the local Hebrew lessons I met a minister of the embassy of South Korea. He told me that Korean is a praised language all over the world (it was news to me - make of it what you want) for its simplicity and therefor speed for typists. He elaborated and said that both the layout (keyboard, I assume) would be very sensible and every 'character' is actually a combination of consonant-vowel-consonant and thereby simple (triplets, always) and carrying a lot of information. Since then I'd like to learn more about this idea and confirm or bust that claim.
2) Learning Hebrew is hard. A real quote from a coworker was "It's an easy language! We only have 22 letters, after all". Reduce your alphabet (alephbet?) from 26 to 22. Note that of these letters, 5 are only special versions of other letters and replace those in the last position of the word. Which leaves 17 letters for most words/the meat of the language. And most words are rather short (okay, okay.. I'm not comparing to German here, that would be pointless. Even compared to english it seems to be the same or shorter to me).
Bottom line: I still have a bet going that I can generate Hebrew line noise (following the rules of going with the 17 letters and adding the required sofit/end letter if required. Gibberish ending in נ would be 'fixed' to end in ן) and will hit word after word. On my list of possible weekend projects I have an entry 'Hebrew or not' to crowd-source this.
One important thing can affect this, though, that many texts can become much more simple (ie. shorter) when translated to Chinese, since that language doesn't have many of the complicated (but also very expressive) grammatical structures of other languages.
Native speakers of English don't phonetically sound out words they read. We recognize whole words, I remember reading a Cambridge study on this years ago, here's the best link I could find http://www.mrc-cbu.cam.ac.uk/people/matt.davis/Cmabrigde/
You can print more Chinese characters than English words to a page, if you read emails in Chinese they're often much much shorter but I'd say it takes native readers of both languages roughly the same amount of time to comprehend 2 equivalent passages. Mind you I said 'comprehend' and not read out aloud to control for differences in the rate of speech for both languages.
In a language like English, 6 syllables won't get you nearly as much meaning.
I've noticed that Chinese pop music seems to have much more expressive, poetic lyrics, even in stuff aimed at a mass audience. A good example is Faye Wong's song "Sky". For a puff pop song the lyrics are quite poetic when translated into English. It's hard to think of an equivalently poetic English language song aimed at such a large audience.
床前明月光 疑是地上霜。 舉頭望明月, 低頭思故鄉。
(Apologies if you can't see the characters on your system.) The key radical "月" (moon) repeats itself as a character on its own and also as a radical making up other characters. Thus, Chinese poems can have a measure of visual resonance as well as audible.
A sad fact is that the PRC government altered the written language to make it easier to learn writing and a lot of this subtle beauty was lost. Today, kids learn to type on a computer phonetically, so the complexity of the traditional characters is no longer an issue.
On the funnier side, the famous Lion-Eating Poet must be mentioned: http://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_D.... Hilarious and beautiful example of constrained writing.
I'm sure there must be examples of poems that do exactly that.
Where I notice this is on a bilingual Chinese/English menu. If there's a particular dish I want, I find it is far quicker to find the Chinese characters.
It was mentioned that English readers will read words (like "dragon") as an atom, rather than letter by letter. In that case, the Chinese character is more unique and recognizable. I suspect that there is more variety in the shape of Chinese characters making them quicker to recognize and scan quickly.
A pineapple is a lot harder to draw than a banana, but it doesn't make it harder to recognize
While the post does note that he is in Taiwan, I suspect that there are large differences in reading speed between Traditional Chinese and Simplified Chinese for a few reasons:
1. Simplified Chinese has less information density per character
2. Simplified Chinese combines more character and uses less characters overall
3. Traditional Chinese uses more 'old-fashioned' vocabulary and idioms which are nearly gone from the Mainland Chinese vernacular
Really my only complaint here is that he should specify that he is talking about Traditional Chinese.
On the other hand, there is a difference between Classical (Literary) Chinese and modern plain speech Chinese. The Classical Chinese, used in ancient times mainly for writing purposes, with its different grammar and vocabulary, does use less characters.
2. People in Mainland China and Taiwan most certainly do not speak the same. There are a litany of spoken differences between 普通話 and 國語. As someone who travels frequently between the two countries, I am constantly shocked how much the two have deviated. There are numerous idioms that are completely unused on either side. Many very basic terms such as the terms for SMS (短信 in China, 簡訊 in Taiwan) are completely unrecognized outside of the respective areas.
(Note: I don't actually know how to read this character this is just an illustration)
no need to thank me :)
That's incorrect. All 22 letters are actually letters, the special versions of letters for the ends of words are not counted towards the full 22.
Also, I think Hebrew words tend to be shorter because they lack vowels. There are ways to add something similar to vowels to words, by adding pronunciation guides to each letter. These are usually not included in most Hebrew writing, but this trusts that the reader already knows how to pronounce the word.
I'm not a linguist, so I'm not sure this vowel thing matters, but that's my guess as to why Hebrew words are shorter.
Vowels: You're right, of course. My bet originated during lunch talks. My intuition (in other words: more stupid mistakes ahead, maybe..) says that by leaving out the vowels and overloading letters (b or v? f or p? u, o or v? etc. pp.) the language loses a lot of error correction margin [1] and leads to more collisions/a denser field of 'actual words' [2].
1: That refers to the ability of taking western languages and removing all vowels there. Or stripping out random letters etc. I'm certainly _far_ _far_ from an adapt reader here, so I'm musing about things that interest me although I lack the required experience.
2: Which leads to my 'Hebrew or not' idea. My gut says that randomly pounding the keyboard results a lot more often in 'real words'.
Not bashing hebrew. I even like the script by now (in the beginning hand-written text looked especially random to me).
But it's not just triplets. You can have 2, 3, or 4 components per character (Wikipedia says 5, but I'm not sure how that works). The first component must be a consonant, but there's a null consonant too, allowing you to create syllables with just a vowel sound.
I don't see how the writing system itself helps typists. If words are shorter, that's a function of shorter words, not the writing system. A syllable still requires several keystrokes each.
That said, it's very simple to learn. I learned it in a week. But, I also learned the Japanese scripts (hiragana/katakana) in a week as well using James Heisig's awesome book[1]. So, perhaps learning alphabet scripts is just not an overly difficult task in general?
1: http://www.amazon.com/Remembering-Kana-Hiragana-James-Heisig...
That said, while it may be easier to learn, I imagine people still read it at about the same speed they would read English or Chinese. I don't remember where but a while back I read an article saying that basically all spoken languages impart information at roughly the same rate (e.g. languages with higher information density are spoken slower, languages with lower information density are spoken faster). And from the OP's article it sounds like this applies to written languages as well.
I'm not sure how much this will change your "random pounding on keyboard" idea, but English has a much larger vocabulary than Hebrew.
P.S. I see you're in Israel now, hope you're enjoying your time in Tel Aviv.
vowel-less: (I don't need to repeat the 'I suck at hebrew' disclaimer, right?) Kind of. From what I know:
There are no explicit letters for 'a', 'e'
You can represent an 'o' or an 'u' with a ו (otherwise used as consonant, 'v'). I know for a fact that words with 'o' can be written without a 'vowel' letter (לא for example: no). I don't know if 'u' is always represented as a letter of it's own.
'i' is often used as 'ij' or 'ji' and teams up with 'י = j' in that case. It can be represented without a letter just as well though.
Bottom line: Except for 'u' (no idea about that one? maybe just as well?) you can have all vowels 'hidden' in plain sight.
Would love to have someone from IL chime in here though and correct all my mistakes.
You're right about the י and ו replacing vowels much of the time. Inspired by this discussion, I went and read a little more about the history of Modern Hebrew, and stumbled on this page in Wikipedia: http://en.wikipedia.org/wiki/Ktiv_male
The idea of writing vowel-like signs into the letters is called Nikkud. But apparently, since most people don't write Nikkud, the Academy of the Hebrew Language wrote a set of rules explaining how to exchange Nikkud for letters that will serve as vowels. I had no idea it was so deliberate, but this explains why there are many words which people here write differently.
I still think there is more ambiguity in Hebrew. Or, as you put it, less "error correction margin". Even with "Ktiv Male".
[1] I lived abroad for a few years, so I wasn't in Israel from the 2nd grade to the 7th. This means I missed a lot of the "traditional" learning process of learning about the language, grammar, etc. So I'm a native Hebrew speaker, but I have some gaps in my knowledge about the correct way to do things, etc.
* 龍 (simplified as 龙) by itself always means dragon *
In the Chinese character compound 水龍頭 , which is the usual Chinese word for "faucet," there is an etymological reason why the character 龍 is there, but a Chinese person, just like an English-speaking person, thinks of the object as "faucet" (one semantic unit, spoken in three syllables) rather than as "water dragon head."
Other examples can be multiplied by anyone who has paid careful attention to the details of the Chinese writing system. The compound 車床 (lathe) is completely opaque to a native speaker of English, who might guess that "cart bed" means "chassis," but would never guess that it means "lathe."
A counter example would be an example of 2 Korean words that are ambiguous in regards to homophones when written in Chinese characters but non-ambiguous when written in Hangul.
Actually I'd be very interested to see any Korean, Japanese or Chinese words that are more ambiguous when rendered with characters than phonetic syllabaries (i.e hangul, hiragana, zhuyin...).
I personally met people in California that speak spanish (mexican immigrants), who don't seem to care about learning english all that much, despite living in the US for a decade or so. That is because of the large spanish speaking community there. Even more so in a country where english is not the native language. You can just ignore the "constant background exposure".
It's not like Lady Gaga being in the charts means people also pay attention to what exactly she sings. Even for US movies European countries either use subtitles or have the dialogs spoken in the native language (the second option sucks, btw).
On my eternal TODO list is to use a spaced-repetition drilling program to buff up my French vocabulary since I do pretty much have everything else I need to at least read it, but, well, like many around here my TODO list is quite long....
That maybe happens with Americans and high school foreign languages.
In Europe --and certainly in my country--, most people speak the foreign language they were taught just fine, without "constant background exposure". It could be a cultural / motivational thing. I don't think many Americans care to learn foreign languages, despite being forced to do so at high school.
(As an aside: how many foreign language movies do American's watch? We do tons --and not only Hollywood films).
Here where I am, the median family actually PAYS for extra-school language courses. It used to be just english, in the eighties, but since the nineties most children study TWO foreign languages.
>> Most Europeans learn English (as a foreign language) without a "constant background exposure"
When I biked around France about 20 years ago, I brought along a radio hoping to polish my (very poor) French. All I heard on the radio were songs in English, interspersed occasionally with short bursts of French that were too rapid for me to more than occasionally make out a single word.In English, we'd try to break out prefixes and suffixes, and come up with "some kind of disease", or "something to do with the heart". But in my experience watching Chinese people, unfamiliar characters just get a "dunno".
Could it be that the difference I'm seeing is because of the use of simplified characters rather than traditional? Maybe simplified characters have cleaned out some of the clues.
I do agree that the example given is not good, though.
I find Japanese extremely good for visual text scanning. You can look for a particular character and find it at ease. Semantically unimportant text (grammar) is usually hiragana (visually very different) and loan words are usually katakana (also visually very different). There is usually a space after the main subject (in the form of the punctuation mark 、 - not really a comma and doesn't work the same way either).
I can scan through text in Japanese considerably faster now that I can in my own mother tongue. From my point of view that's pretty definitive. My Chinese is still lower intermediate but I can see how my Chinese and Taiwanese friends read and they aren't any slower than Japanese.
It's actually self-evident for anyone who knows both languages that Chinese is semantically more dense than English and by that effect alone it's much faster to read (it takes fewer words to express the same thing, generally speaking).
I could point you to many studies to this effect but I think it's beside the point. I think the point the author is trying to make is that Chinese is faster to "scan" than English. This is harder to prove, but from my own experience I think so. As a matter of fact I do scan Chinese characters in text faster than words in my own mother tongue, and I've been reading Japanese for about just 13 years and Chinese for about 4 years (I'm in my 30s).
It's probably not so much when the original is not Japanese written by a native-level speaker, because the semantic density is usually lower.
Measuring word count would give you a completely different idea. It must also be noted that you don't usually read every single word in Asian languages since often whole sentences are short enough to recognise their whole shape. This happens in all languages but in ideograph based ones sentences take a lot less space in readable form (although they may take about as much if not more pen strokes).
Disclaimer/explanation: I actually studied Japanese, but I'm assuming the same principle applies. If so, the symbols would be different because of the differing meanings, and the pronunciation is coincidental.
Your example is not very good either, because 可口可樂 is an intentional translation aimed to invoke positive association by using existing words, it could totally go by sound and be 考卡考樂.
Similar effect can be achieved translating Chinese to English too, there are plenty spelling confusions for Chinese family names such as Wang, Dang etc.
By the same token, it is self-evident that English text, gzipped, then base-64 encoded, being denser, will be much faster to read than bare English. Because of that, I do not think that argument has much value.
On the other hand, contracted Braille is more complex than uncontracted Braille, but reading speed _in_cells_per_second_ seems to be about equal for both (http://faculty.sfasu.edu/mercerdixie/spe520/uncon_vs_cont_br...). That makes reading contracted Braille about 30% faster than reading uncontracted Braille (http://vision.psych.umn.edu/groups/gellab/Legge99.pdf)
So, I do not rule out that something similar applies to Chinese vs English.
Do you actually know enough of both Chinese and English to judge this? because it would be a bit silly to discuss with someone who does, if you don't.
In other words, you cannot assume a variable to be unknown when one of the parties does know it.