Hacking Harmony or The Demon Chipmunk Choir

Unknown

At the inaugural Peabody hackathon, Hacking Harmony, I built my first attempt at choral music synthesis. The concept was simple: autotune google text-to-speech. Using Python, I built a music parsing engine which would read in a (well formed) MusicXML file, and then perform the following steps:

  1. generate a list of all words sung in the piece
  2. download audio for each word from the google text-to-speech API
  3. use the Montreal Forced Aligner to detect phoneme boundaries in each word
  4. send a recipe to Matlab containing each word, along with pitches and durations for constructing the song

At this point, Matlab takes over and splices all of the words together according to the recipe, while also autotuning each word. This part was performed in Matlab as it had better tools for quickly writing an autotuner, including pitch detection, pitch shifting, and very robust tools for working with audio data in general.

The output of this whole process sounds about how you'd expect autotuned text-to-speech to sound—my favorite reaction was that it sounded like a choir of demon chipmunks.

Examples


Hark the Herald Angels Sing - Felix Mendelssohn

Christmas Time is Here - arr. David Samson

Brother John - Folk Song

When David Heard - Eric Whitacre (intro only)

Sanctus (London) - Ola Gjeilo (intro only)

Upon the Hearth - Matthew Samson

As usual, accurate pitch detection was the biggest problem I had to deal with (runner up being phoneme segmenting). I'm always surprised by how difficult such a simple sounding problem always turns out to be—in this case though, I think the out-of-tune-ness of it is quite in line with the quality of the rest of the result.

Links