On the beauty of EXMARaLDA for “computer assisted transcription and annotation of spoken language”
[Warning: path ahead slippery with geek guano]
Let’s say you were thinking about putting your piddly little Mandarin transcripts and sound files into some more accessible format. Gosh, wouldn’t it be cool if you could milk the data a bit, since you pay good money for your staff to do such painstaking transcription (even if they do botch the job half the time)?
What if you could search all of those past transcriptions…
- for a word/phrase (Pinyin, hanzi, English) and then play back the associated sound file, right at the spot where the word/phrase occurs!
- by speaker type (e.g. male/female, Beijinger/not, age young/middle/old, Zhonglish/native speaker)
- by specific phonetic / grammatical feature, say, erhuayin or consonant elision, any of those BJSLBC™ features we know and love
Wow, that’d be sweet. But where to start? The BJS technical team is of no use, hardly able to complete a simple port from one blogging platform to another. Maybe you understand that all that transcription and linguistic stuff would need to go into a data structure of some sort. You’ve heard of XML, so you learn a bit about that, enough to realize it would do the trick, in theory. But naturally you’d need to mark everything you want to search for, the phrases, the elisions, the erhuayin and so on — so you need a tool for doing that. Feeling foolhardy bold, you cobble together an XML data structure and the world’s nastiest Excel spreadsheet, and by pokes and jabs you’re able to produce some mildly useful XML, in small quantities.
But it still doesn’t do anything with audio files, and these are the Beijing Sounds studios, for criminy’s sake.
You’re about to give up when, suddenly, like the fairy godmother for Cinderella, like magic beans for Jack, like Badger and his cudgel for Toad, EXMARaLDA comes to you as in a vision: “She walks. She talks. She sings lullabies!”
Undaunted by the collosal failures of the past, the Studios’ technical director has just begun exploring the capabilities of the Partitur Editor (that’s the main transcription tool in the EXMARaLDA family), and it is nothing short of astounding. Essentially, both the data structure it builds and the tool it provides for doing so are vastly superior to anything syz even dreamed about up until now.
So at the risk of introducing the tool too early (when I can still barely use it myself and haven’t yet learned to make it do all those things in the bullet points above), let me offer just one example of how I put it to work for the silkworms post.
- Here’s the transcript of the Silkworms post with the audio linked to every utterance, so you can play back any old place and not just the whole audio file. [Caution: this probably doesn't work in all browsers, and in Firefox I generally have to reload the page once to get the player, at the top of the page, to display correctly. The page is also pretty slow to load, so just be patient. If you click on an asterisk at the left side of the table, it will play back that particular phrase, but the response is kind of slow. Once you get it working, though, you'll wonder, "Why didn't he put it into the post itself? Having phrase-by-phrase playback is ever so much more useful than playing the whole audio file!" I couldn't agree more, but so far I haven't figured out how to do that.]
- Here’s the Partitur Editor file (.exb) for that same page, which you can download and open up in the free Partitur Editor (download is here) to take a look at as a messy example of how the transcript is created and the xml is structured.
- Here’s the mp3 file that goes with the exb file above
Special thanks to the tool’s author, Thomas Schmidt, for lots of offline help in getting things this far. If anyone is interested in more info or can help me work out the bugs, we can discuss here or offline (bjshengr <at> gmail etc.)

Add "Learn Chinese" to iGoogle
Comments 8
I’m impressed. Downloading it for myself now.
I hear a beep at around 50 second. That wouldn’t happen to be your phone that you’re using to record would it? Sounds exactly like mine.
Posted 09 Jun 2009 at 3:55 pm ¶er, seconds.
Posted 09 Jun 2009 at 3:55 pm ¶it was my phone, but not what I was using to record. yeah, there’s some sort of universality about electronic sounds these days
Posted 09 Jun 2009 at 5:28 pm ¶Sweeeet! It worked for me in Firefox right off the bat.
Back when I first started learning Chinese I did a few transcriptions like this. I found this on the wayback machine. Unfortunately, the wayback didn’t archive the .wav files (or even my little image icons), and my current website is a shambles.
I remember that it was a lot of work! I think I did about three or four of these altogether. I also tried to start developing a kind of an XML format for them, but never got very far. It’s really cool to see this so polished.
Posted 10 Jun 2009 at 10:17 am ¶Oh, and I meant to add, it seems to me it could be a really useful tool not only for the really high-end geeky language analysis you do (which I love, BTW) but also for producing study materials at all levels.
Posted 10 Jun 2009 at 10:19 am ¶Oh, oh, oh … and what about song lyrics?
Posted 10 Jun 2009 at 12:02 pm ¶No problems whatsoever in FireFox. A very awesome tool. It might even convince me to start using sound files more.
Posted 11 Jun 2009 at 6:49 am ¶Klortho, yeah the possibilities for even the basic use of synchronized audio with text are enormous. You would think (or at least I thought before I started on this mission) that there would be at least 17 free tools on the Internet for synchronizing text and audio. After all, the idea is pretty basic. But the only ones I was able to uncover were expensive, complicated tools whose documentation did not make it entirely clear, at least to me, that I would be able to do what I wanted to do.
And of course the additional sexiness of EXMARaLDA is that the synchronized text w/ audio is just a fringe benefit of the deep XML structure that the tool helps you create.
Posted 12 Jun 2009 at 6:42 am ¶Post a Comment