2 min read
We now have a script to build a German-to-IPA dictionary for automatic transcription! Find it on our GitHub page.
All you need is a text file with a German poem (here's a sample), another text file with an IPA translation of that poem (here's a sample), a starter dictionary, and our dictionary scripts (GermanToIPA and DictionaryBuilder). Put them all in the same folder, and then run the DictionaryBuilder script:
(or open DictionaryBuilder.py in a developer-friendly editor like TextMate for Mac, and type command-R).
The script will double-check that the German and IPA files have the same number of words. If so, then it will take the IPA for every German word not already in the dictionary and add it as a new entry in the dictionary. On "Nacht und Träume," it runs in less than 1/10 of a second. Woohoo!
The way we'll use this for our project is first, to build the dictionary from the transcriptions we've already done. Then we'll use that dictionary and the GermanToIPA script to pre-transcribe new poems. Then we'll transcribe any of the words not done automatically and run the DictionaryBuilder on it to add those words to the dictionary.
Now every poem we transcribe will, theoretically, make future transcriptions a little bit faster. And once we get a big enough collection of poems done, transcribing new ones should be a breeze.