Conundrummery July 28 2009 3 comments
it’s occurred to me that the reason the things i want to be available aren’t available for the very problems that i’m now up against. to call the scale of this undertaking grand would be a rather severe understatement.
just how am i supposed to organise this? i have four characters. one is simplified and one is the standard traditional character. then i have two more that are z-variants of the traditional form. z-variants, if you don’t know, are different forms of the same character aside from the simplified/traditional forms. a simple example of this might be 户 hù, which can be 戶 or 戸 in traditional form.
i’d include them all in one entry in the database but then i worry about creating duplicates of which i may be unaware. if i keep them separate then they may end up seeming totally unrelated. if i can find an easy way to link them but as separate entries i’ll go with that. otherwise i’ll put them all in one entry so long as i can easily search to include all variations.
but then i have to ask: are 伍 and 五 the same name? what of two others that look rather different but have the same meaning and pronunciation and are not a s/t issue? i don’t have an example handy but i’ve seen it more than once.
Posted on Tuesday, July 28th, 2009 at 05:32. , comment feed
, respond , trackback
3 Responses to “Conundrummery”
Leave a Reply
|
xiao er jing | near, far, east.
featured posts
Limit to posts about colour, architecture, language in general or those limited to Wu, Uyghur or Manchu.
|
contact email
kellenparkerⓐgmail․com
twitter
@KellenParker me
@xiaoerjing islam in china
@AnnalsOfWu the Wu language site
design & content ©2009-2006
Kellen Parker unless stated otherwise
|
July 28th, 2009 at 14:22
It seems to me that the technical solution to the first issue is pretty simple. Assuming you're using a relational database, I'd create one table to store the last name, and pick a variant that will uniformly represent last names — perhaps the standard simplified variant — for interface purposes. Then create another table for variants (should also contain a "type" field to identify what sort of variant it is) that is linked to the original last name table. When you're entering new names, the interface should check to see if that character already exists in the variant table.
To the second point, I would tend to argue, though, that 户, 戶, and 戸 are identical, whereas 伍 and 五 aren't. To me, the litmus test is whether or not a person would correct if you wrote their name one way or another. I suspect a 户 would be find with 戶, but a 伍 would object to 五. How to actually make that determination, though, is pretty tricky.
July 28th, 2009 at 21:38
a good point. i think i'm better off having the most modern traditional form be the standard and count simplified as a variant since some of these are obscure enough to not have a simplified form that's supported in unicode. though not a family name per se, 綪 is a good example. the simplified form is easy enough to see but it's just not covered on the computer.
as for 伍, that's reasonable enough. it seems i'm stuck making a large number of judgement calls. should 乕 be considered separate from 虎, for example, or should it be considered a variant? of course if you wrote 乕 today, just about anyone who saw it would object. that makes me think they should be separate, though the fact that the objection may be based solely on the obscurity of 乕 makes me think otherwise.
as you said, tricky. though ultimately as the whole thing is searchable, whether 乕 is 虎 or not doesn't really matter. i may be better off just going with modern usage to determine what's what. so if i can find someone currently using 伍 and someone else using 五 (just as an example) with whom the difference isn't just PRC/ROC issues, then they get to be separate. otherwise i'll have to call them variants.
August 27th, 2009 at 00:38
Do you know about Han unification?
http://en.wikipedia.org/wiki/Han_unification