Voice Prompts

Voice Prompts - Creating your own

[Ref: Recording voice prompts with Audacity and SOX, Convert WAV to Raw format ]

Table of Contents

Benoit Frigon’s Custom Voice Prompts is a very good guide on creating your own custom messages, voice prompts.


These notes mostly reflect Benoit’s notes, with additional highlight of areas which I did not follow correctly, and therefore got garbled/broken audio.

If you get garbled audio, please check Benoit’s notes or follow the notes again.


[Ref: Audacity]

Audacity is free, open source, cross-platform software for recording and editing sounds. Audacity is available for Windows, Mac, GNU/Linux and other operating systems.

We will be using Audacity for these instructions, and a key configuration item will be to set the audio sampling correctly.

  • Track Rate: 16000 Hz
  • Project Rate: 16000 Hz

Desired Sampling


[Ref: Best Practices in Designing Speech User Interfaces]

Benoit’s howto describes recording the messages using audacity, unfortunately I didn’t have that luxury (read: no microphone) which led to a few other complications.

If you’re like me, you are going to just ‘dive in’ and start getting recording voice prompts/messages.

Good approach, as you get an idea of how to use Voice Prompts/messages.

Bad approach if it’s been running for a while and you haven’t reviewed your organisation goals and how better to address them with the assistance of your new Asterisk IVR.

Get the first batch of voice prompts/messages how ever you can, but plan for iterations/updates.

Getting the content

Get your voice messages how you can. When we first did this, a friend recorded the messages on her iPhone (AAC) and emailed it to me.

The better the original quality, the less post-processing/work you have to do to make it sound good.

Lesson Learned

  • Record as a single stream of audio

Record the prompts as a single stream of audio, not several short audio clips. It seems like recording as single clips makes more sense (because that’s how the output will be.) The post-processing work/workflow ends up being a lot more convenient if you record everything as a single stream.


Create a new Audacity project:

  • Set the Project Audio Rate
  • Import your audio
  • Set the Track Audio Rate
  • Removing Noise
  • Slicing | Labelling
  • Generate Audio Files

Project Audio Rate

[Ref: Changing Audio Sample Rate in Audacity]

Set the Project Rate to 16000 Hz.

The below image shows where you can set the Project Rate (Hz). Use your mouse and click on the drop-down box and select 16000.

Set Project Audio Rate

Import your audio.

[Ref: Importing Audio]

Import the audio/music/announcment using the File –> Import menu as in the below diagram.

Import Audio

Track Audio Rate

Set the audio track to:

  • Mono
  • 16000 Hz
Stereo Source

If the imported audio is stereo, convert the audio to mono with the menu action

  • Tracks –> Stereo Track to Mono to convert the audio.

Note: The left hand panel describes the Audio sampling in our example before the conversion, it is

  • Stereo;
  • 44100Hz;
  • 32-bit float

![Set the Track, Audio Rate](/mmedia/bsd/voip/audacity-tracks-stereo track to mono.png)

After the conversion, as in the image below. The panel will show a different sampling, and most important we need it to be ‘Mono’

Mono Source

To convert the mono track rate to 16000 Hz, select the audio track and the menu:

  • Tracks | Resample …
  • Choose 1600 Hz

Set the Track, Audio Rate

If we’ve configured our system correctly, you should have at least the following configuration visable on your editor.

Desired Sampling

Noise, Unwanted audio

[Ref: Noise Reduction]

If you’re like me and our unprofessional recordings come through with a lot of ‘pops’ and other noises that are distracting in the recording, then we need to clean it up.

  • Highlight a ’noisy’ part of the Audio Track that we can use as a “Noise Profile”

Select the Menus

  • Effect |
  • Noise Reduction |
  • Get Noise Profile

Noise Reduction

Once you have your Noise Profile, you can select the sections of the audio track you wish to clean up and then select:

  • Effect |
  • Noise | Reduction
  • Noise Reduce
  • OK

Slicing | Labelling

[Ref: Splitting a recording for exporting as separate tracks]

Your audio is clean (or you’re going to dive in and try again later) and the next stage is to export the audio so you can convert to something Asterisk can use.

If you have multiple voice segments, then one method that works better for us is to cut and paste these audio segments into a single larger master audio track and label the track segments for later processing.

The following image indicates how to ’label’ audio track segments/selections.

Slicing and Labelling

The new master audio track simplifies a few things about your audio.

Combine the best recordings.

There will always be some differences in the way your voice artist has recorded the audio.

Sometimes, it works better to combine separate recordings to get the best complete version.

Silence before and after the audio.

Use the master track to provide the ’natural’ gap between voice segments.

Take into consideration that you may wish to combine the voice segments in different ways.

One file for the prompt

You may wish to generate a single file for the prompt.

The following image shows what your edit screen might look like with judicious naming and audio ‘gaps’ between segments.

Slicing and Labelling

Naming Convention

  • Use the naming convention you will use (e.g. ivr-select-9-to-return-to-main-menu)

The label your nominate will be the default filename used when exporting to a file.

It simplifies the workflow if you use a naming convention that works for you when the file is exported.

For example, the above ‘slice/labels’ will generate filenames such as:

  • thank-you-for-calling.raw
  • company-name.raw
  • one-of-our-friendly-staff-will-be-with-you-in-just-a-moment.raw
  • company 2.raw

Audio (silence) Gap

There is a ‘gap’ of silence between words when we speak, when we look towards creating prompts that are a combination of separately spoken words, then we need to consider what is a ’natural’ gap of silence in our prompts.

Of course, you don’t have to worry about this if you record every prompt separately. But, it is an interesting challenge that gives you a lot more flexibility on how to make use of the recordings you already have.

The gap you leave between different recordings will be dependent on the gap that sounds natural for the audio collection, the below is Benoit’s guidelines.

  • 60 ms either side of prompt
  • 100 ms at end of a sentence

The true gap will have to be validated, and some prompts have different silence requirements.

To validate the audio silence of your choice, you can copy/paste the audio you want to combine into another audio-track and mute all other tracks. You can now play, adjust your audio until you’ve discovered the silence spacing that fits your prompts.

  • Create a new audio track
  • Copy/Paste your voice prompt into this track
  • Mute other tracks
  • Play/Edit the prompt as necessary

Generate Audio Files

[Ref: Export Multiple

Once we’ve completed our edit process, we need to export the slices as separate audio files because that’s how we want to deal with it in Asterisk.

But before you make the conversion, save yourself anguish by making sure you have the track and project audio sampling configured correctly (as per the below pictures)

Desired Sampling

The conversion process is based on having the above/below sampling rates for both the track and project at 16000Hz.

We save the audio slices/segments to file by performing an Export. start the Export by choosing the Menu:

File –> Export multiple …

File | Export Multiple Dialog Box

Because we wish to export by the ’labels’ we’ve created above, then ensure that dialog button is black.

From the [Export Multiple] dialog select:

  • for the export format: select Other uncompressed files
  • beside “Other compressed files” select the Options… for formatting options.

File | Export Multiple Dialog Box | Format Options

Inside the “Specify Uncompressed Options”, select:

  • Header: RAW (header-less)
  • Encoding: Signed 16 bit PCM

Click: [OK] to return to the Export Multiple Dialog.

Click: [Export] in the “Export Multiple Dialog” to export/generate audio files from our labelled selections.


[Ref: sox]

SoX reads and writes audio files in most popular formats and can optionally apply effects to them. It can combine multiple input sources, synthesise audio, and, on many systems, act as a general purpose audio player or a multi-track audio recorder. It also has limited ability to split the input into multiple output files. All SoX functionality is available using just the sox command. To simplify playing and recording audio, if SoX is invoked as play, the output file is automatically set to be the default sound device, and if invoked as rec, the default sound device is used as an input source. Additionally, the soxi(1) command provides a convenient way to just query audio file header information.

Sox is available as a port package in OpenBSD, and the simplest thing is to install it.

The below script is taken out directly from: Benoit Frigon’s post

Please refer to the above for more details.


mkdir -p alaw
mkdir -p ulaw
mkdir -p gsm
mkdir -p wav
mkdir -p sln16

for file in *.raw; do
    file_out=`basename "${file%.*}"`
    echo "converting $file_out..."
    cp $file sln16/$file_out.sln16
    sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t raw -r 8k -e a-law -c 1 alaw/$file_out.alaw
    sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t raw -r 8k -e u-law -c 1 ulaw/$file_out.ulaw
    sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t gsm -r 8k -c 1 gsm/$file_out.gsm
    sox -t raw -r 16k -e signed-integer -b 16 -c 1 $file -t wav -r 8k -c 1 wav/$file_out.wav
echo '---------------------------------------------------'
echo 'Done!'

Alternate Tools.

Asterisk 11.18.0
Asterisk*CLI> help file convert
Usage: file convert  
       Convert from file_in to file_out. If an absolute path
       is not given, the default Asterisk sounds directory
       will be used.
          file convert tt-weasels.gsm tt-weasels.ulaw


Make sure the audio results (from the above conversion) is what you need/expect.

The last thing you want, is for your clients/users to listen to garbled messages.

Play the converted files back on your workstation.

Play the converted files in a test extension.