Automatic Speech Recognition with Word-Level Models

This lab will focus on building a speaker-independent ASR system for recognizing strings of spoken numbers. Therefore, the vocabulary of the task is 11 words, including zero pronounced as "oh" and "sil". The notes below will refer to a ExptDir, a template for your work which can be found at /export/arnab/work/summer-school06/template. The subdirectories ExptDir/train/setup and ExptDir/test/setup contains the necessary framework to replicate or build upon the baseline system.

All the programs below should be executed on an xYY machine, where YY=13... 36.
ssh -X xYY

Copy a template to start your own experimentation:
ExptDir=/export/arnab/work/summer-school06/template
cd YourExptDir
cp -r $ExptDir .

Note: If your shell is tcsh (echo $SHELL), then prefix all variable assignments with "set", as in:
set ExptDir=/export/zak/macrophone/lab/sys1

Corpus   Model Parameters   Training the System   Evaluating the System


Corpus

All experiments in this lab uses MacroPhone corpus from LDC. For training, we use a subset of about 8k utterances from a large number of speakers, each speaker contributing a few utterances. The utterances refer to a string of numbers spoken in various contexts. The test comprises of three sets with different string lengths, namely, 7, 10 and 12 digits.

ExptDir=/export/arnab/work/summer-school06/
The scripts below will need the following inputs from the corpus.
ExptDir/setup/trn.utt.list - list of input feature vector for training utterances.
ExptDir/setup/htk.cfg - HTK configuration file to read the above features.
ExptDir/setup/trn.word.mlf - word strings associated with above utterance.

The files associated with test can be found in /export/arnab/work/summer-school06/setup.
dev.7.list - list of input feature vectors in 7 digit test set.
dev.10.list - list of input feature vectors in 10 digit test set.
dev.12.list - list of input feature vectors in 12 digit test set.
htk.cfg - HTK configuration file to read the above features.
devtst.word.mlf - word strings associated with above utterance.


Model Parameters

The most important parameter that controls the behavior of the ASR system is the number of states used for each model or digit in this case. This is controlled using the HMM topology file, ExptDir/setup/hmmtop. All experiments in this lab assumes the HMMs to have a left-to-right topology. In principle, by modifying the ExptDir/scripts/clonehmm.pl, it should also be possible to have mode complex topologies which allows states to be skipped. The ExptDir/setup/hmmproto is a prototype of the model and contains information such as the type and length of the observation vector, the type of covariance and the kind of feature vector to expect. The ExptDir/setup/wrdlist contains the list of digits for which models need to be trained.

Training the System

ASR systems are usually trained by increasing the complexity of the model in steps. The main script, ExptDir/setup/mlTrainLocal.pl, performs the training using the following steps.
Note: The last two steps are commented out of the script. One could potentially attempt several variants of this procedure, with more ViterbiAligns or fewer EMTrains or gradual/faster mixture splitting schedule. Any optimization of this procedure can only be carried out empirically.

cd YourExptDir/train
./setup/mlTrainLocal.pl YourExptDir/train


Evaluating the System

To evaluate the system, you need to define a grammar, a space of all hypothesis, possibly with costs or probabilities associated with each hypothesis. In this task, any digit could follow any digit, so the set of hypothesis is defined by an open loop grammar with no cost, /export/arnab/work/summer-school06/setup/wdnet. In addition to parameters that you have already used above, the decoder could optionally use a word insertion penalty, this reduces the tendency of the ASR system to spew out spurious words.

cd YourExptDir/test
mkdir results
mmf=YourExptDir/train/CI-3/hmm4/MMF
odir=YourExptDir/test/results
wip=-60
./setup/test.sh $mmf $wip $odir
./setup/eval.sh $odir
For changing your grammar you will need to use HParse. Look at file:/export/ears/common/src/htk/HTKBook/htkbook/node156_mn.html for a description.