skip to primary navigation skip to content

The VoiceKey program for off-line voicekey measurements.


The VoiceKey program can perform off-line voicekey measurements on wav files, for example those produced by E-Prime in naming experiments.

Off-line software voicekeys have many advantages over electronic on-line voicekey devices. The VoiceKey program can be much more precise and will allow fine-tuning of the voicekey parameters on a level that is not possible with electronic voicekey devices. The only reason to use an electronic, online device is when you need the output of the voicekey to trigger a subsequent event in your experiment or to provide some kind of feedback. When this is not necessary it is vastly preferable to record the responses and use off-line voicekey software instead.

The VoiceKey program needs to be called from a Windows command line and doesn't have a graphical user interface. The output will be written back to the command line and can be redirected to a file using the standard Windows '>' sign, like this:

VoiceKey monkey.wav gorilla.wav > output.txt

The file 'output.txt' will be created, or overwritten if it already exists, with the filenames and voicekey times of all the files specified, 'monkey.wav' and 'gorilla.wav' in this case. Appending output to a file can be done by using '>>' instead.

The program can be called with a list of filenames, or with a wildcard description of the files to be processed. In these wildcards a '?' stands for any single character, and a '*' for zero or more instances of any character. So, '*.wav' will process all wav files in the current directory, while 'condition?.wav' will process files like 'condition1.wav' or 'condition2.wav', but not 'condition12.wav'.

Getting results


The VoiceKey program works by first measuring a baseline RMS power level for the file. By default this is set to be the mean power level of the first 100ms of the file, but can be set to be any period of the file. Alternalively, a global baseline can be used, where a time window is moved over the whole file, and the RMS power value of the 10th percentile window is used.

The program will then move a time window over the whole of the soundfile, and calculate the mean RMS power for each window position. The first position that exceeds the baseline by a set threshold will be reported. The default threshold value is a factor of 2, but this can be set to be any value greater than 1 and smaller than 1000.

By default the start of the first window that reaches the threshold is reported, but this can be changed to be the middle of the window. Window size is 5 ms by default, but can also be set to any value from 1 to 100 ms.

In addition, a second threshold can be specified. The program will now only trigger a response when the second threshold value is reached, but still report the last point when the first threshold was reached. This will prevent VoiceKey to respond to clicks and other noises, while still reporting the correct onset of speech when it is encountered. By default, the second threshold is not set. It is recommended to set the second threshold to a substantially higher value than the first one.

A duration threshold can also be set. This specifies the minimum time the signal needs to be above the first threshold value before a response is triggered. The first window that exceeded the threshold is still reported.

A second threshold can be used together with a first threshold duration. In that case a response will be triggered when either one or the other is met. The program can be set to only trigger a response when both are met, by using the -A option.

Cues


To help with manual inspection and correction VoiceKey has a few features that insert and read cues from the sound files.

First, VoiceKey can produce a copy of the original wav file with a cue inserted at the position of the estimated response. This cue will be labelled 'VoiceKey'. The output file will have the same filename, but with '_v' attached just before the .wav extension. All such files will be excluded from processing, to prevent processing the output of a previous call.

The inserted cue will allow you to inspect the waveform in a sound editor and see the location of the voicekey response superimposed on it. This is one of the most precise and easy ways to verify the correct working of the VoiceKey program.

When a VoiceKey cue is judged to be in the wrong place, the user can add a manual voicekey at the correct location. All cues other than the 'VoiceKey' cue will be read from an existing outputfile and inserted in the new outputfile, together with the new VoiceKey cue. This prevents the loss of manual cues that have already been added.

When all settings are optimal and manual cues have been inserted for those files where VoiceKey wasn't able to get the right response, the final response times can be read from the output files using the -RC option. Note that when -RC is used the program will not do any analysis on the original inputfile, but just report cues from output wav files created by earlier calls of VoiceKey. All settings referring to the way voicekey responses are generated are ignored.

When -RC is used with the -ML option, specifying the name used for manually inserted labels, the program will report cues with manual labels in preference to the 'VoiceKey' labels.

When -RC is used with the -AL option the program will show all cues present in the output file, including VoiceKey cues. This will allow a comparison between manually inserted cues and the cues inserted by the program.

Options reference


Options that require a value will have to provide that value as the next parameter at the command line. For example, to set a baseline duration of 50 ms, use

VoiceKey -B 50 *.wav


All options

  • -B  : Set Baseline duration, default 100 ms.
  • -BS : Set Baseline start, default 0 ms from file start.
  • -GB : Use Global Baseline, 10th percentile RMS value.
  • -W  : Set Window size, default 5 ms.
  • -T  : Set first Threshold, default 2.
  • -T2 : Set second Threshold, default none.
  • -D  : Minimum first threshold duration in ms, default 0.
  • -A  : Only trigger on both T2 AND Duration, default OR.
  • -M  : Use Middle of window, default is start.
  • -R  : Use Right channel, default is left.
  • -IC : Create new file with cue at response position.
  • -RC : Don't calculate but read cue from existing output file.
  • -ML : Label for manual cues, reported instead of voicekey. Use with -RC.
  • -AL : Report cues for all labels, manual and VoiceKey. Use with -RC.
  • -ST : Sort files by file time, default alphabetic.
  • -P  : Output full Path, default filename.

This options overview will also be shown by the VoiceKey program when called without parameters.

Download VoiceKey

In case of problems, questions or bugs to report, please email me at maarten.van-casteren@mrc-cbu.cam.ac.uk

genesis();