[Ref: Recording voice prompts with Audacity and SOX, Convert WAV to Raw format ]
Table of Contents
Benoit Frigon’s Custom Voice Prompts is a very good guide on creating your own custom messages, voice prompts.
Requirements:
These notes mostly reflect Benoit’s notes, with additional highlight of areas which I did not follow correctly, and therefore got garbled/broken audio.
If you get garbled audio, please check Benoit’s notes or follow the notes again.
[Ref: Audacity]
Audacity is free, open source, cross-platform software for recording and editing sounds. Audacity is available for Windows, Mac, GNU/Linux and other operating systems.
We will be using Audacity for these instructions, and a key configuration item will be to set the audio sampling correctly.
[Ref: Best Practices in Designing Speech User Interfaces]
Benoit’s howto describes recording the messages using audacity, unfortunately I didn’t have that luxury (read: no microphone) which led to a few other complications.
If you’re like me, you are going to just ‘dive in’ and start getting recording voice prompts/messages.
Good approach, as you get an idea of how to use Voice Prompts/messages.
Bad approach if it’s been running for a while and you haven’t reviewed your organisation goals and how better to address them with the assistance of your new Asterisk IVR.
Get the first batch of voice prompts/messages how ever you can, but plan for iterations/updates.
Get your voice messages how you can. When we first did this, a friend recorded the messages on her iPhone (AAC) and emailed it to me.
The better the original quality, the less post-processing/work you have to do to make it sound good.
Record the prompts as a single stream of audio, not several short audio clips. It seems like recording as single clips makes more sense (because that’s how the output will be.) The post-processing work/workflow ends up being a lot more convenient if you record everything as a single stream.
Create a new Audacity project:
[Ref: Changing Audio Sample Rate in Audacity]
Set the Project Rate to 16000 Hz.
The below image shows where you can set the Project Rate (Hz). Use your mouse and click on the drop-down box and select 16000.
[Ref: Importing Audio]
Import the audio/music/announcment using the File –> Import menu as in the below diagram.
Set the audio track to:
If the imported audio is stereo, convert the audio to mono with the menu action
Note: The left hand panel describes the Audio sampling in our example before the conversion, it is

After the conversion, as in the image below. The panel will show a different sampling, and most important we need it to be ‘Mono’
To convert the mono track rate to 16000 Hz, select the audio track and the menu:
If we’ve configured our system correctly, you should have at least the following configuration visable on your editor.
[Ref: Noise Reduction]
If you’re like me and our unprofessional recordings come through with a lot of ‘pops’ and other noises that are distracting in the recording, then we need to clean it up.
Select the Menus
Once you have your Noise Profile, you can select the sections of the audio track you wish to clean up and then select:
[Ref: Splitting a recording for exporting as separate tracks]
Your audio is clean (or you’re going to dive in and try again later) and the next stage is to export the audio so you can convert to something Asterisk can use.
If you have multiple voice segments, then one method that works better for us is to cut and paste these audio segments into a single larger master audio track and label the track segments for later processing.
The following image indicates how to ’label’ audio track segments/selections.
The new master audio track simplifies a few things about your audio.
There will always be some differences in the way your voice artist has recorded the audio.
Sometimes, it works better to combine separate recordings to get the best complete version.
Use the master track to provide the ’natural’ gap between voice segments.
Take into consideration that you may wish to combine the voice segments in different ways.
You may wish to generate a single file for the prompt.
The following image shows what your edit screen might look like with judicious naming and audio ‘gaps’ between segments.
The label your nominate will be the default filename used when exporting to a file.
It simplifies the workflow if you use a naming convention that works for you when the file is exported.
For example, the above ‘slice/labels’ will generate filenames such as:
There is a ‘gap’ of silence between words when we speak, when we look towards creating prompts that are a combination of separately spoken words, then we need to consider what is a ’natural’ gap of silence in our prompts.
Of course, you don’t have to worry about this if you record every prompt separately. But, it is an interesting challenge that gives you a lot more flexibility on how to make use of the recordings you already have.
The gap you leave between different recordings will be dependent on the gap that sounds natural for the audio collection, the below is Benoit’s guidelines.
The true gap will have to be validated, and some prompts have different silence requirements.
To validate the audio silence of your choice, you can copy/paste the audio you want to combine into another audio-track and mute all other tracks. You can now play, adjust your audio until you’ve discovered the silence spacing that fits your prompts.
[Ref: Export Multiple
Once we’ve completed our edit process, we need to export the slices as separate audio files because that’s how we want to deal with it in Asterisk.
But before you make the conversion, save yourself anguish by making sure you have the track and project audio sampling configured correctly (as per the below pictures)
The conversion process is based on having the above/below sampling rates for both the track and project at 16000Hz.
We save the audio slices/segments to file by performing an Export. start the Export by choosing the Menu:
File –> Export multiple …
Because we wish to export by the ’labels’ we’ve created above, then ensure that dialog button is black.
From the [Export Multiple] dialog select:
Inside the “Specify Uncompressed Options”, select:
Click: [OK] to return to the Export Multiple Dialog.
Click: [Export] in the “Export Multiple Dialog” to export/generate audio files from our labelled selections.
[Ref: sox]
SoX reads and writes audio files in most popular formats and can optionally apply effects to them. It can combine multiple input sources, synthesise audio, and, on many systems, act as a general purpose audio player or a multi-track audio recorder. It also has limited ability to split the input into multiple output files. All SoX functionality is available using just the sox command. To simplify playing and recording audio, if SoX is invoked as play, the output file is automatically set to be the default sound device, and if invoked as rec, the default sound device is used as an input source. Additionally, the soxi(1) command provides a convenient way to just query audio file header information.
Sox is available as a port package in OpenBSD, and the simplest thing is to install it.
The below script is taken out directly from: Benoit Frigon’s post
Please refer to the above for more details.
#!/bin/sh
mkdir -p alaw
mkdir -p ulaw
mkdir -p gsm
mkdir -p wav
mkdir -p sln16
for file in *.raw; do
file_out=`basename "${file%.*}"`
echo "converting $file_out..."
cp $file sln16/$file_out.sln16
sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t raw -r 8k -e a-law -c 1 alaw/$file_out.alaw
sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t raw -r 8k -e u-law -c 1 ulaw/$file_out.ulaw
sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t gsm -r 8k -c 1 gsm/$file_out.gsm
sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t wav -r 8k -c 1 wav/$file_out.wav
done
echo '---------------------------------------------------'
echo 'Done!'
Asterisk 11.18.0
Asterisk*CLI> help file convert
Usage: file convertConvert from file_in to file_out. If an absolute path is not given, the default Asterisk sounds directory will be used. Example: file convert tt-weasels.gsm tt-weasels.ulaw
Make sure the audio results (from the above conversion) is what you need/expect.
The last thing you want, is for your clients/users to listen to garbled messages.
Play the converted files back on your workstation.
Play the converted files in a test extension.