Ithinkit’shardernotjustbecauseyou’renotusedtoit

That is, I think reading English would be harder if there was no word spacing, not just because you’re not used to it but because, well, it’s one more task that your brain has to do.

Since I’ve talked about misparsing recently, and since Kellen brought up punctuation just a couple of days ago, isn’t it time to wonder if any writers of Chinese have ever experimented with adding spacing into the writer’s toolbox? I mean instead of this*:

我写那篇东西时太年轻,发了很多过激议论。

you could have

我  写  那篇  东西  时  太  年轻, 发了  很多  过激  议论。**

I’m too ignorant on the topic of spacing to do anything but pose the question. But if you know of any such hanzi spacing experimentation and could provide some references, please comment away and I’d be more than happy to re-post on the subject when I’m a little less benighted.

[Credit: I’ve been meaning to do this post for a while but was reminded by Chinayouren’s recent post on reading speed, in which Julen faults word parsing for slowing down the non-native speaker. Again, my guess is that it slows down the native speaker too, but that would be pretty hard to prove I guess.]

*Random line from 王小波《黄金时代》

**Well, it does look weird, and I decided to put two spaces between each word because one just didn’t seem like enough…

41 responses to “Ithinkit’shardernotjustbecauseyou’renotusedtoit”

  1. Katie says:

    Here’s one, but I don’t have access to the full text. I read the abstract to mean that native speakers read equally well either way, but I’m not curious enough to pay for the article. I suppose Chinese at least has the advantage of having clearly marked syllable boundaries in their writing.

  2. Syz says:

    Katie, the article looks really interesting. I’ll see if I can get my hands on it. I’m very curious to see the details of how they did the test. There are a huge number of variables to deal with, not least of which is just good old individual variance. But maybe most critical, in my mind, is allowing time for native Chinese to get used to a spaced writing system. I think ANY change in writing takes time to get used to. Even when such a change might ultimately make it easier, it might cause confusion and slowing for a good long time before the reading speed could actually increase.

    Good point about hanzi and syllabification. That’s a huge advantage over English, I’d guess — I mean if you were going to compare how hard it is to read the two languages without spaces. As I’ve learned hanzi over time, of course I too have gotten used to the general crutches of parsing that I suppose native speakers use, e.g. that lots of words are two syllables, and many of the one-syllable words are very common and so on. So really, maybe spacing wouldn’t ever make any difference for those who grow up reading hanzi. But it sure would help me 😀

  3. Julen says:

    Thanks for that link!

    Before I even examine that document, here is one certitude I have after a few tests: whether spaced or not, believe me Chinese people read fast!!

    In my last informal test I did last year, two girl friends in Shanghai read the text even faster than I read the same text in English. I had to suspend the experiment because I was starting to feel a bit depressed …

  4. We were just talking about this in a class this week. I forgot what the text was, but we were cold reading and there were a few times people made mistakes with reading two words as one, for example 所以 is one word in Mandarin but two in 文言. Like that but with much less common words. Tripping up native speakers as well.

    I love writing a 4000 character paper and then having my word processor tell me it’s 6 words, since it calculates based on spaces. Really kills any feeling of accomplishment.

  5. Julen: Curious. Can you read Spanish any faster than English, or at this point is any difference gone due to your rockstar English skills?

  6. Julen says:

    I never tested this. But I really don’t feel any difference reading in Spanish, French or English.

    I do feel difference in Basque because I learned it primarily as a spoken language and I never used it so much in the context of formal texts. Official government texts back home are all issued in bilingual format, and when I try to read the Basque side my head goes dizzy very fast.

    It is not impossible that I read a bit slower in English than Spanish. But my reading speed is still definitely within the range of native speakers, so it is functional by my definition.

  7. hsknotes says:

    Native speakers read chinese at the speed of light, no joke. It’s not even funny. They devour books written in Chinese in a few hours, or a day, while the equivalent thing in English would take them, or a native-english speaker, a substantially longer amount of time. Any small amount of testing will bear this out to anyone, but I assumed it was common knowledge. I suspect it’s also one of the causes subtitling (english and chinese) can move at the speed of light on dubs, where that sort of thing is far less likely in say, America.

  8. Zev Handel says:

    “I-think-it’s-hard-er-not-just-be-cause-you’re-not-used-to-it”

    That’s not so bad, is it?

    I’ve made a nuisance of myself in the comments to many posts on this blog arguing for the advantages of character-based writing over pinyin. I don’t know what the scientific literature says about it, but it sure seems that characters are a much more efficient reading medium for native speakers than alphabetic writing, as has been suggested by previous commenters on this entry. (And I would venture that to the extent that English spelling strays from strict phonemic representation and toward logographic representation — exactly what advocates of spelling reform dislike about English orthography — it speeds up reading for native speakers as well.)

  9. Zev Handel says:

    It’s interesting that Japanese, written in a mix of kanji and kana, is not written with spaces. Korean, however, now written almost entirely in hangeul, has come to be written with spaces. I suspect this is not an accident. If the Japanese decided to do away with kanji and write only in kana, I bet they’d have to start putting spaces in. And pinyin without spaces is a nightmare.

    I think the reason English is readable without spaces must be partly due to the deviation from strict phonemic representation. I’d venture that French without spaces is easier to read than Spanish, for the same reason.

  10. Syz says:

    @Zev, yeah, yeah, you adore all scripts as they are and despise reform of any sort. That’s fine and that’s why it’s always fun to pick topics we can fight about. 😀

    But in this case, are you arguing that hanzi is easier to read without spaces? Or vice versa? Or no difference?

    I-like-the-syl-lab-if-ied-ex-am-ple, btw. That’s a nice visual example of what Katie mentioned. It does make the whole concept way easier.

  11. Syz says:

    @hsknotes: we really need to come up with a test for the idea that native speakers can read Chinese way faster than native speakers of English can read English. Would be fascinating. Maybe I’ll write a new post and propose a test format we can use on the web.

    [Weird meta-comment: I almost never get emails from wordpress about your comments. This only happens with your comments, no one else’s. Bizarre.]

  12. Julen says:

    @HSKNOTES – “Native speakers read chinese at the speed of light, no joke. It’s not even funny.”

    Yes! I am glad that I am not the only one who noticed this. My Chinese friends read 300 page books in one day, they read amazingly fast. I will be writing more about this soon because it is one of the points I wanted to make. Also I would be very interested if Syz comes up with an experiment for this one.

  13. Julen says:

    @Zev – I am a fierce defender of 汉字 as well, and I don’t hesitate to pick up a fight with all the kings of pinyin.

    “I don’t know what the scientific literature says about it, but it sure seems that characters are a much more efficient reading medium for native speakers than alphabetic writing, as has been suggested by previous commenters on this entry.”

    We are already arriving to the conclusion I wanted to get: that 汉字 have some important advantages as a means of transmitting information. However, before we dare say that in print I guess we should do some more testing (or find some more previously done research).

    I would venture that to the extent that English spelling strays from strict phonemic representation and toward logographic representation — exactly what advocates of spelling reform dislike about English orthography — it speeds up reading for native speakers as well

    I find it more difficult to follow you on this one. I haven’t generally observed that English people read faster than I do on Spanish. The feeling I get is that although Spanish writing is almost perfectly phonemic, native adults don’t really use that info when they read. We use it as logographically as English speakers do, and the fact that the writing is phonemic doesn’t stop us from doing that.

    You say English is easy to parse without spaces, but if you read my post you will see it is not only the spaces, it is other clues like the capital letters that help a good deal as well. In case you didn’t read it, I suggest you try to parse this paragraph at normal speed:

    karunanidhiwaslividthatdayanidhiandbrotherkalanidhihadbec
    ometooambitiousholdingpopularitycontestsagainstalagiriint
    heirnewspaper,whoseofficewasburntdown.rajadidnottakecharg
    eofthetelecomministryalone.kanimozhiwastoremainhis”guid
    e”.hewasfocused.hisallegedundersellingofthe2Gspectrum(ad
    esignatedpartoftheairwavesforusebymobilephoneoperators),w
    hichcausedlossofRs22,466croreaspertheCBI’sestimate,surfaced.

  14. I actually don’t really have any trouble with the spaceless English Julen just posted. But then I also have no difficulty reading upside-down text at regular speed in any language which I’m able to read, something which has recently been pointed out as bizarre, though I’m not convinced it is.

  15. Julen says:

    Clarification – The point is not to be able to understand the text, I can do that as well. The point is to decode it as fast as you do with normal writing. If you are taking twice the time then you obviously failed the test.

    PS. Cybernetic brains and Kellen are not considered in this clarification.

  16. I’m saying it takes me maybe 20% more time, but i wouldn’t say much more than that. The real issue is line breaks in the middle of words. short of that, it’s nothing one couldn’t get used to.

  17. rm says:

    That example with the hyphens beween syllables: is it just me or do you find yourself sounding out the syllables in your head as you read? And therefore reading slower. I never do that when reading normally. But I do when reading Chinese, and wish I could break the habit. I guess it just needs more and more reading….

  18. Kevin Miller says:

    I’ve done some work on this with children (e.g., http://www3.interscience.wiley.com/journal/122385988/abstract) and I believe the following statements are true:

    1. No one has shown any real advantage to adding spaces in reading Chinese (or underlining words), although some argue that this might change with enough practice (but it appears it would have to be a *lot* of practice.

    2. On the other hand, you get immediate advantages in breaking up long words in German for most tasks (Albrecht Inhoff and colleagues).

    3. Children in particular have trouble with long English words, which require multiple fixations, presumably to parse into something like morphemes.

    4. This makes a lot of sense, because there are only a few ways that characters could form words and it’s harder than you might think to come up with ambiguous phrases of the sort (中国心理会不会…, where either 中国心理会 or 中国心理 ends in a word boundary). But in English, and even more so in German, identifying morpheme boundaries is hard because there are a number of perceptually available possibilities.

    5. College students read incredibly fast in the orthography they’ve learned. This is such an over-learned skill that it’s hard to find substantial advantages for either one, but it’s a different story with kids.

    6. As Neil Sedaka put it, breaking up is hard to do.

  19. Julen says:

    Very interesting points. I wish we also had something on reading speed of native in Chinese.

    But I have a comment on point 4: “there are only a few ways that characters could form words”.

    This is not true. There are hundreds of ways. Bear in mind that a big part of the characters can function as single words themselves. Even in the simple example you give: XX中国心理会不会XX, I can think easily of 2 more options:

    X中-国-心理-会不会-XX OR
    X-中国-心-理会-不会-XX

    And if the phrase is longer, with proper nouns, you get a wealth of possibilities. In fact I am pretty sure that in English there are less ambiguities. In the text below, I was only able to find a few cases (I count an ambiguity when misparsing a word leaves valid words to its right and left)

    hadbecometooambitiousholdingpopularitycontestsagainstalag
    irlintheirnewspaper,whoseofficewasburntdown.rajadidnottakec
    hargeofthetelecomministryalone.kanimozhiwastoremainhis”guid
    e”.hewasfocused.hisallegedundersellingofthe2Gspectrum(ad
    esignatedpartoftheairwavesforusebymobilephoneoperators),w
    hichcausedlossofRs22,466croreaspertheCBI’sestimate,surfaced.

    [I added a couple of line breaks —RA]

    Sorry if I missed some point, I did this quickly. But I think the idea is clear. This would indicate that my own initial post may be wrong. Perhaps it is not the problem of parsing ambiguities that make non-spaced reading so difficult to us, but something else.

    I am guessing now that it has to do with the difficulty for our eye to focus on a defined portion of text at each beat, because we are trained to take the white space as breaking reference. I understand this is what Neil Sedaka meant in point 6, and that would also explain why kids (or Germans) find it difficult to read very long words.

  20. Julen says:

    1) Kellen, sorry I think I just broke your blog.

    2) I did indeed miss a few cases, like the con-tests-against, but I think there are still much less possibilities than in a text of the same length in Chinese.

  21. Zev Handel says:

    @Julen,

    “X中-国-心理-会不会-XX”

    I don’t understand this one. 国 isn’t a word, as far as I know.

    I may be way off base in my comment hypothesizing that English may be faster to read than Spanish. What leads me to suggest this is not how closely the languages come to a phonemic ideal per se, but a side-effect of that: whether homophones are written identically or differently. The fact that “to”, “two”, and “too” are written differently, or “meat”, “meet” and “mete”, should speed up reading to a degree. I don’t know Spanish well enough to know if there are more or fewer homophones in general than in English. (Here’s a fun exercise: how many other homophonous triplets can you think of in English for which all three written forms are distinct?)

    @rm
    your observation about sounding out words when the syllables are separated makes perfect sense. When we read English, we don’t sound out words. Rather, we recognize words by their overall shape. When the orthography deviates from what we are used to, it’s harder to recognize words from their overall shape — and we have to fall back on the much slower alphabetic method of reading. (This is also why we slow down and sound out unfamiliar proper names, which we cannot recognize from word-shape alone.) When I try to read Julen’s space-less paragraph, I find that I can read some words very quickly and others not so quickly. The words I can read quickly have distinctive shapes that stand out from their surroundings despite the lack of spaces. These are generally the longer words, like “popularity”. Picking out the word “a” or “i” is very hard.

    I think a few of Kevin’s points are worth reiterating: what is slow or hard for children or second-language learners is not necessarily hard for adult native speakers. Children, of course, will become adult native speakers. But we second-language learners may never achieve reading fluency in scripts that differ substantially from what we grew up with. (Thanks too to Kevin for giving us some facts to mix in with all our speculations.)

  22. Julen says:

    @Zev, 国 is a word in all the dictionaries I have, not to mention it is also a Surname. Although you are right that it is much more often used as part of a word.

    Regarding Kevin’s comment, I agree. Thanks to him for bringing some fact in the middle of so much darkness. I am afraid that at some point we are going to be blocked in this anyway, because I don’t have the means or the time to conduct proper testing, and speculation can only go so far.

    If anyone can bring in some more research links it would be much appreciated.
    _________________________________________

    Regarding the too/to/two, I see what you mean now. It makes a lot of sense when you have homophones. But I just can’t think of so many cases in English to call it a system. And in fact, I CAN think of many cases where it does not work.

    For example: “to” is written the same whether it is a preposition or an infinitive particle. “If” is written the same in all its functions. Same for “who”, “when”, and many other things.

    Funnily enough, the Spanish language is so obsessed with its regular script, that most of these cases have visual aids. For example: “who”, “what”,”when” are “quien”, “que”, “cuando”. They all carry an accent tilde in the cases where they are interrogative, even if they are pronounced the same.

    In many other cases in Spanish where there are homophones, like “solo” (only) and so’lo” (alone), there is this mark that we call the “diacritical tilde”, which ensures that visually the words are not the same.

  23. Julen: I’ll send you a bill for repair costs.

    This thread is destroying my efforts to clean up the theme here. You’re all making everything all side-scrolly.

  24. Zev Handel says:

    @Julen,

    I really feel like 国 is a bound morpheme. What in your dictionary indicates that it is a word? I believe it’s not possible to say “这是什么国?” or “北美有三个国.” In both cases you’d have to use 国家. Of course, I’m not a native speaker, so I could be wrong.

  25. Julen says:

    @Zev,

    I am not a native speaker either, but there are many tools you can use to check this. The dictionaries I use are among the best recognized dictionaries Chinese-Chinese and Chinese-English. Including the Xiandai Hanyu Guifan Cidian and the ABC.

    It is even easier than that, just go on Google and check the expression “这是什么国?”, it is the least you can do if you are going to counter my statement!You will see thousands of occurrences of 这是什么国 where the 国 is an independent word.

  26. GAC says:

    I love writing a 4000 character paper and then having my word processor tell me it’s 6 words, since it calculates based on spaces. Really kills any feeling of accomplishment.

    Get a better word processor. Word 2007 parsed my 4700-character capstone paper at around 4000 words. Seems like a decent estimate, thought that would mean lots of 1-character words. However, I have still yet to find a Chinese spellchecker 😛

  27. Kellen Parker says:

    Zev, Julen:

    What about 我国? It’s a stretch, I know, but still…

  28. Zev Handel says:

    @Julen,

    I actually did do a web check before I wrote that. I just did another one, searching on Google for “这是什么国”. All the results on the first page are for “这是什么国家”. Of course, I know that Google results differ depending on where you are when you search. If you see other results, let me know.

    The ABC dictionary is arranged by word, not character, so the editors must have had good reason for listing guó as a word. I’d like to know what that reason is.

    @Kellen: 我国 is a fixed phrase — an idiomatic expression, which is essentially a lexical item (i.e., a word, in a word). It’s not separable. You can’t say 我的国 or 你国 or 他国. There are many phrases in formal Chinese (especially written Chinese) that are imported from the classical written language, which had many more free morphemes than modern spoken Mandarin does.

    I will happily concede that 国 is a word if someone can show me a clear-cut example in ordinary speech.

  29. Aaron Posehn says:

    I don’t know if someone in the above comments already addressed this, but from my experience when I was learning Chinese, I used to chat online in order to practice. Some Chinese people would put spaces between their typed words (I’m assuming they thought it would give me an easier time reading), but I always thought that it was distracting and it DID in fact slow me down.

  30. Zev: What about 國 in “你是哪國人?”

  31. Zev Handel says:

    Kellen, I think 國 there is bound as well. 哪國 is parallel in structure to 美國, 中國, 法國, etc. I would argue that 哪國 is a question word made up of two bound morphemes.

    Here’s a simple test: if 國 is a word, it should be possible to use it all by itself to constitute a sentence, for example in answer to a question “X 是甚麼?“ You can no more answer 國 as a complete sentence than you can 桌. As another test, if W is a word, it should be possible to say the phrase “一種/個很重要的X.” Can you say that substituting 國 for W? It doesn’t sound right to me, but again, I’m no native speaker, so I could easily be wrong.

  32. Julen says:

    @Zev, In my search of Google there are thousands of examples of single 国 uses. You have to do the search with “” so the 国 comes as a separate word.

    I know that it is more common as part of a word, like 我家 or 我国. But this doesn’t mean it cannot function as a single word. I have 4 of the best dictionaries, including Chinese authors and American authors. The ABC itself is known for distinguishing clearly between bound forms (marked as BF) and words. So there is absolutely no doubt that 国 is a word, there is no point in discussing this, unless you have some serious proof.

    By the way, you CAN say 他的国, this is precisely the title of the latest Han Han book that came out last year. You might say it is an unusual phrasing, used more in writing than in everyday speech, so what. If 国 is not a word, then you might agree that “realm” is not a word either, since you hardly use it in everyday speech.

  33. Zev Handel says:

    @Julen,

    Thanks for providing the link to your Google result. They are different from the results page I had gotten with my search, and you’re right, they definitely show 国 used as an independent word.

  34. KaFoo says:

    One guy asking to Picasso how much time he took to make that painting. 30 seconds ?
    Picasso to answer: yes, 30 seconds + 30 years.
    Same for Chinese reading. Please let’s compare what’s comparable.
    Reading a text in Chinese characters may take less time than in English…but don’t forget the lost years. Yes, I am provocative. But I have a theory on this. I believe the years spent to learn Chinese characters have a direct impact on maturity. Everybody living in China has noticed a difference in maturity. Usually, I would assess around 5 to 10 years (yes) maturity gap. Look in your office, and you will understand. Secretaries, 20 years old, behave like western 14 years old (pictures, e-mails, puppets, attitude….). My personal theory is that is is directly linked to the “lost years” spent to swallow Chinese characters.
    Result is nice in one hand: can read quickly a text. In an other hand, looks like a lot of efforts spent in only one field of mind development. Please comment.

  35. Syz says:

    @KaFoo: it’s easy to be provocative. It’s hard to come up with testable hypotheses that could actually advance knowledge of language / culture. I’m afraid you’ve got a non-starter here.

  36. KaFoo: I’m debating whether your comment even justifies a response, but I’m sure someone is going to tackle it so it might as well be me, given all the free time I now have having only learned 26 letters in my youth.

    I’d put it thusly: They’re not stopping their lives during the time they learn the 4000 characters they need to speed-read. Just as my learning of characters doesn’t keep me from being able to hang out with my friends and hone my social skills.

    I’m gonna go ahead and say there’s little to your statement but grasping at straws to explain a perceived maturity difference between cultures. There’s no lack of linguistic imperialism floating around to ruin my day. I’d hope to avoid it here.

  37. Kellen: I second that motion and call for a resolution to table the nomination.

    核心重點,中肯尖銳 “I’m gonna go ahead and say there’s little to your statement but grasping at straws to explain a perceived maturity difference between cultures.”

    Perhaps we can blame the cultural revolution on language issues as well? Or go little Britain and pull an Orwell and claim political-speak is itself the cause of political ineptness.

  38. Zev Handel says:

    I don’t think KaFoo’s claim has anything to do with linguistic determinism. His claim seems to be that if you have to spend too much time studying as a child, you don’t develop social maturity. That claim in and of itself is not testable and seems unlikely to be true. How you would measure “maturity” and how you would isolate all of the factors that might contribute to it over the course of one’s childhood would seem a nearly impossible task.

    But that aside, it’s not clear to me why KaFoo thinks it’s time spent on learning Chinese characters, rather than on math, or playing musical instruments, or spelling tests, or any one of a zillion other things that kids study with differential amounts of time across cultures, that is the principal factor.

  39. flow says:

    i would like to caution that things quickly get complicated as soon as the topic of words is raised; this is even true for a language like english, where the very concept of the word (as opposed to 字) has been of central importance for linguistic thinking.

    let me just point out that for me, while i am not sure whether 國 ‘is’ a ‘word’, i am still somewhat pressed to demonstrate it is not, in daily usage, really a ‘bound form’. yet, it is easy to walk down the wrong gardenpath when confronted with the series of characters, 中國家常豆腐, as all of 中國, 國家, 家常 make for perfectly valid binoms. and we still haven’t found out whether 家常豆腐 is one word or two. of course, it’s the 家常 kind of 豆腐, so that’s two words; on the other hand, if you sample the many fancy tragrammatic expressions typical for a traditional restaurant, you may wonder how much they really work as standing expressions, and how many dragons are used to cook a dragon’s claw soup. this i see as a distinctly pragmatic aspect of language: words, in this sense, are items stored in your memory together with usage patterns, some kind of semantic representation maybe, situations in the past where you used or misused the item, and so on. i give another culinary example: ever heard of mulligatawny soup? that first item means, in tamil, ‘pepper water’. now pepper is not even a mandatory ingredient of this soup, which far from being just hot water with pepper added, is often yellow, thick, and rather ‘soupy’. to most english speakers, ‘mulligatawny’ means nothing, except if they happen to know this particular kind of soup, they associate the soup with that word. in more careful english, “let’s start with a mulligatawny” is maybe not as correct as “let’s start with a mulligatawny soup”; so ‘mulligatawny’ may be more of a bound than a free form. so is ‘mulligatawny soup’ one word or two? it is difficult to decide.

    according to http://books.google.com/books?id=VXpR01oMiZsC&pg=PA20&lpg=PA20&dq=das+wort+im+chinesischen&source=bl&ots=cGfnYmM6vV&sig=CN8BfgNxkD65SUhA8mTP4B7ngYs&hl=en&ei=moBYTIuFAY7LOMC5qIsJ&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCkQ6AEwAw#v=onepage&q=das%20wort%20im%20chinesischen&f=false, page 49, the academy of sciences in peking at one point came up with a definition for the word in chinese which included, amongst others, three points: * two characters can be regarded a word if the second character is in the neutral tone; * they can also be regarded a single word if one of the characters is not used alone; * they are a grammatical structure if both characters are frequently used on their own. that’s certainly a starting point, but hardly exhaustive. there is, be it said, a whole branch of linguistic research that deals with this single problem.

    maybe there ‘are’ no words, after all, at least not in the sense we would naively want to have them, as clear-cut universal building-stones of language. 會不會 was mentioned earlier. i am not sure whether to call that one word, or two, or three words. it certainly forms a tight unit in my speech-conscience.

    even in languages like german, where everybody seems to know what a ‘word’ is, boundaries sometimes get redrawn, as in the most recent orthographic reform over here, when compounds like ‘zusammenkommen’ (get together) were re-spelled as ‘zusammen kommen’ (which can also mean ‘to come together’). this is very akin to english phrasal verbs.

  40. flow says:

    ah, before i forget… while in traditional chinese texts, punctuation was usually rather sparse or absent, this is not universal. there have been diacritics in use for centuries to mark alternative pronunciation of characters (by adding a small circle to one edge of the character, indicating its tone); variations of circles and dots to mark phrases, and an assortment of single, straight, double and twiggly lines to single out names of places, personal names, and place names. these greatly help in understanding texts, and also demonstrate that even to educated native speakers it was not always easy to parse unbroken strings of characters.

  41. flow says:

    oops: “… to single out names of places, personal names, and book titles”

Leave a Reply