Automatic Speech Recognition with Phone-Level Models

This lab will focus on building a speaker-independent ASR system for recognizing strings of spoken numbers. Therefore, the vocabulary of the task is 11 words, as in previous system. The template for this system can be found in /export/zak/macrophone/lab/sys2. Again, the subdirectories ExptDir/setup and ExptDir/scripts contains the necessary files to replicate or build upon the baseline system.

Copy a template to start your own experimentation:
ExptDir=/export/zak/macrophone/lab/sys2
cd YourExptDir
cp -r $ExptDir/setup .
cp -r $ExptDir/scripts .
TstDir=/export/zak/macrophone/lab/test
cd YourTestDir
cp $TstDir/* .

Note: If your shell is tcsh (echo $SHELL), then prefix all variable assignments with "set", as in:
set ExptDir=/export/zak/macrophone/lab/sys2

Corpus   Model Parameters   Training the System   Evaluating the System


Corpus

Same as in previous system, described here.

Model Parameters

In this system, the acoustic models are at the phone-level. Again, the design decision includes picking the number of states for each model. This is controlled through the HMM topology file, ExptDir/setup/hmmtop. Again, a left-to-right HMM topology is assumed. The ExptDir/setup/hmmproto is a prototype of the model and contains information such as the type and length of the observation vector, the type of covariance and the kind of feature vector to expect. The ExptDir/setup/phnlist contains the list of phonemes in the dictionary.

In addition, the system designer needs to decide which phone set to use to expand the words into phonemes in the dictionary. For an example, see ExptDir/setup/dict. The one big advantage of using phone-level model is that the recognition system can decode new words that are included in the test dictionary, without having seen those specific words in the training data.


Training the System

Here again, we follow the same recipe as in the previous system, increasing the complexity of the system gradually in steps, as in ExptDir/mkHMMs.sh.
Other variants of this recipe may produce better results than the one given in the template. Any optimization of this procedure can only be carried out empirically. The main script, ExptDir/scripts/mkHMMs.sh, requires an input file, an example of which can be found in ExptDir/scripts/expt.in. Many of the parameters in the input file have been described above. In addition, the variables nProcs (number of processes) and qsubHdr (array resource request) define the parallel computing environment needed for training. The variable nEM sets the number of iterations of EM carried out each time EMTrain is invoked.

cd YourExptDir/scripts
rem: Edit edir in expt.in to point to YourExptDir.
./mkHMMs.sh ./expt.in &> ./mkHMMs.log


Evaluating the System

As in the previous system, an open loop cost-less grammar is used for evaluating this system. The results are not very sensitive to the word insertion penalty. Check the files *.res to see the results. The template should give you word error rate in the range of 2-4%.

cd YourExptDir
mkdir results
d=YourExptDir/setup
dict=$d/dict
mlist=$d/wrdlist
mmf=YourExptDir/CI-6-Mix/hmm4/MMF
odir=YourExptDir/results
wip=-60
$YourTstDir/test.sh $dict $mlist $mmf $wip $odir