Assembled Voices: Making ATC Talk
"Pacifica 208 would like ILS Runway 19 approach vectors to final."
What does it take to record, assemble, and play this simple phrase in Microsoft Flight Simulator? Much more than just recording air traffic controllers speaking into a microphone.
"Tot would like tot approach tot."
What? Would you spend hour upon hour in a recording studio talking like this? Ten intrepid Flight Simulator team members said, "Yes." Although they initially knew little about how much they were signing up for, all of them volunteered their time through two versions of Flight Simulator.
Flight Simulator ATC uses an array of hundreds of concatenated phrases that are assembled at runtime depending on the choices the user makes from the ATC menu and the controller's responses to events. The air traffic control system in Flight Simulator uses more than 45,000 .wav files in ten voices. Playing a single phrase in the simulator sometimes requires splicing many pieces together. For example, when a user chooses an IFR clearance from the ATC menu, the controller responds with a phrase that is structured like this:
[callsign] is cleared to [airport_destination] airport [route]. Fly runway heading, climb and maintain [altitude_new]. Departure frequency is [frequency_departure], squawk [squawk_new].
At runtime, all of the phrases in brackets (called tokens) are replaced by sub-phrases so that the top-level phrase makes sense. The token replacements in the example are based on the tail number, flight plan, airport information, and a random squawk code. Typically a phrase fills in like this:
Pacifica 483 is cleared to Miami airport as filed. Fly runway heading, climb and maintain 15,000. Departure frequency is 137.275, squawk 5422.
Complexity and flexibility are required to fit a wide variety of ATC situations. In many ATC phrases, the tokens can resolve to multiple sub-tokens. For example, the [callsign] token can resolve to [callsign_civ_long], [callsign_civ_short], or [callsign_commercial]. The [callsign_civ_long] token is used for initial calls to ATC by non-commercial aircraft, and [callsign_civ_short] is used in subsequent calls. Each of those tokens resolves to a series of letters and numbers to create the call sign.
There were four main challenges in producing the audio for ATC: recruiting voice talent, producing natural-sounding ATC dialog, editing the raw audio, and managing the files in post-production.
In addition to the hundreds of phrases, more than 3,600 airport names were recorded. Each of the ten people spent about five hours in the studio just to record the airport names! To get the place name pronunciations as correct as possible, we had native speakers at Microsoft record hundreds of audio files that were played for the voice talent in the studio. (Try pronouncing Gorna Oryahovitsa if you're not from Bulgaria.)
Naturalistic sound
Because audio would later replace the tokens in the phrases, we couldn't have words slurring into one another ("cleared to Albuquerque" is normally vocalized as "cleared to-walbuquerque"). To solve that problem, the script generator automatically replaced the tokens in scripts with the word "tot." The word "tot" has a hard consonant at each end and therefore gives a clean sound between the surrounding words in the sentence. That avoids the problem of liaisons in concatenation.
Editing the audio
Each recorded phrase that contained tokens had to be edited to remove the word "tot" and replace it with an appropriate marker. A marker in a .wav file identifies the spot where another piece of audio can be spliced into the phrase. The marker must be labeled with the name of the phrase that belongs at that marker. For example, the phrase:
[callsign] is cleared to [airport_destination] airport [route]. Fly runway heading, climb and maintain [altitude_new]. Departure frequency is [frequency_departure], squawk [squawk_new].
is scripted to read:
Tot, is cleared to tot airport tot. Fly runway heading, climb and maintain tot. Departure frequency is tot, squawk tot.
And then the audio is edited to remove "tot" and insert markers:
[callsign] is cleared to [airport_destination] airport [route]. Fly runway heading, climb and maintain [altitude_new]. Departure frequency is [frequency_departure], squawk [squawk_new].
File management was a challenge for a number of reasons. As the production process evolved, phraseology was refined and changed. That meant each minor change cascaded through multiple files. It was crucial that changes be made to all of the appropriate files or else the simulator could not correctly read the phrase. The tools we used to verify consistency are the same tools available with the ATC Voicepack SDK.
This is the kind of challenge that makes Flight Simulator a fun product to work on. We were fortunate to find talent already existing on the team who could do the job. So next time you use ATC in Flight Simulator, tip your hat to the folks whose voice you hear and repeat after me a thousand times, "Tot, is cleared to tot airport tot…"
link:
http://www.microsoft.com/games/flightsimulator/fs2004_assembledvoices.asp