Scripts and banned words

A bit late to the party on this one, but a few days ago Danwei had a great translation from Hecaitou’s blog on the futility of blocking dirty words. Creative stuff:

Hecaitou originally wrote: 不 矢口 亻十 么 日寸 候 , 亻奄 口斤 言兑 矢豆 亻言 也有 辶寸 氵虑 敏 感 字 节 白勺 言兑 氵去 , 于 是 , 亻奄 学 会 了 扌斥 字 ……后 来, 亻奄 米青 礻申 分 歹刂 鸟~”

Danwei translation: “I don’t know when it was that I heard that mobile phones are also being filtered for sensitive words, therefore, I learnt to split characters… later on, I became schizophrenic”

For those still wondering what’s going on, Hecaitou takes characters that can be broken into parts which are also characters in their own right — and he simply breaks them up. The result is visually clear but hard for an unsophisticated character/phrase-blocking program to understand. Compare

Original: 不 矢口 亻十 么 日寸 候 (9 characters, meaningless gibberish)

Read as: 不知什么时候 (6 characters)

Gotta say, a lot more fun than #### for naughty words on voicemail transcripts from Google, which was also news last week (NB: my brother tried to reproduce their results in a voicemail to me but only succeeding in getting it to write “box”).

But of course, this sort of script-play isn’t the exclusive domain of hanzi. Try searching this page for Τіbеt and Хіnјіаng and see what you come up with. Nothing? That’s right. If you see it, but you can’t search for it, is it there?

If you like the effect, you too can have it. Check out Kellen Parker’s “sensitive word masking for blogs in China” tool.

8 comments

  1. January 25, 2010, 11:39 pm

    I’ve heard this called Martian on a lot of blogs. In that case it tends to be an alternateen sort of thing. The other common method is to use characters that are homophonous but with quite different characters.

    I actually did this on my personal blog for my name since I didn’t really want to be so easily connected to the name in search engines.

    Reply
  2. February 8, 2010, 7:26 pm

    The result is visually clear but hard for an unsophisticated character/phrase-blocking program to understand.

    I’m not sure if I buy this part… Isn’t it just another string of characters (the same as any new word added)? Not only that, but I think that with the character composition databases out there, the 拆字后 characters could even be automatically generated from a list of sensitive words.

    You can’t deny the creativity, though!

    Reply
  3. February 9, 2010, 8:51 am

    Sure, just another string of characters, but… well, I guess it depends on your definition of “sophisticated”

    Reply
  4. Alex
    February 13, 2010, 3:12 pm

    Can you PLEASE change your website’s font? I’ve increased the zoom on my browser but reading italicized Chinese character is no fun any way you cut it. The site looks great aside from that. I’m going to enjoy reading it.

    Reply
  5. February 13, 2010, 6:49 pm

    Alex:
    Italics have gone. I’ll up the base font size a point or two as well.

    Reply
  6. Carmen
    February 14, 2010, 2:04 am

    Schizophrenic? or Schizophonic?

    Reply
  7. February 14, 2010, 7:19 am

    @carmen: COL (chortling out loud). I’m going to borrow schizophonic some day

    Reply
 

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

ɑ
æ
ɐ
ɒ
ə
ɛ
ɞ
ɤ
ɜ
ɪ
ɨ
ɿ
ʉ
ɘ
ɚ
ɵ
ø
ɔ
œ
ɯ
ʌ
ʊ
ɥ
ʏ
ː
ˑ
ā
ē
ī
ō
ū
ǖ
ǎ
ě
ǐ
ǒ
ǔ
ǚ
~
ɶ
β
ɕ
ð
θ
ɸ
ɣ
ɦ
ɱ
ɲ
ŋ
ɾ
ʂ
ʃ
ʋ
ɥ
ʐ
ʑ
ʒ
ʔ
̩
̚
̥
͡
w
h
ɦ
1
2
3
4
5
H
M
L
˩
˨
˧
˦
˥




What's this?
You are currently reading Scripts and banned words at Sinoglot.

Meta
Author: Syz (Steve Hansen)
Comments: 8 Comments
Categories: Mis-parsing Mandarin, Writing


Recent Posts