CLSP Homepage  :  Reseach Homepage
 
 CLSP RT-02 System
1 Discriminative Speaker adaptive Training

Until recently SAT techniques have been based on maximum likelihood (ML) parameter estimation framework. During MLE training, model parameters are adjusted to increase the likelihood of the word strings corresponding to the training utterances without taking account of the probability of other possible word strings. MMIE training was proposed in  as an alternative to MLE and maximises the mutual information between the training word sequences and the observation sequences. 
The MMIE criterion  increases the probability of the model sequence corresponding to the training data given the training data. Discriminative optimization criteria can be more effective in reducing the word error rate than maximum likelihood estimation and hence are of interest. 
Recent work by McDonough provides the formulae for reestimating the linear transforms using MLLR and the models parameters using MMIE. In this work both the linear transforms and the model parameters are reestimated under MMIE criteria.
The SAT training routine used is as follow:
1. Start with the speaker independent model set.
2. Estimate a speaker dependent transform for each speaker using the MMIE framework.
3. Estimate the new model set (SI) given the current speaker-dependent transform.
4. Goto step 2.
Below we have the re-estimation formulas


The use of the speaker dependent transformations aims at reducing the speaker-specific  variation in the speech signal thus producing more accurate models. Three iterations of ML-SAT  were performed on the models of HMM Set C. Then we applied four iterations of D-SAT. Linear transforms with two regression classes were estimated for each training speaker using CMLLR [14,15], which is an MMI estimation procedure for discriminative linear transforms. These speaker dependent transforms are then applied in the estimation of speaker independent HMM Gaussian means under the MMIE criterion; the HMM Gaussian variances were not updated. The CMLLR[14] and MMI estimation was performed using the training set lattices also used to train HMM Set C.
 
 
 
 

2. Performance of ML-SAT D-SAT  Systems
 

Dev01
 
SWBD1
SWB2
ML SAT 
          MLLR+MMIE 
24.4
38.6
MMIE +ML-SAT(iter 1)
24.0
38.4
MMIE +ML-SAT(iter 2)
24.0
38.2
MMIE +ML-SAT(iter 3)
24.0
38.3
D-SAT from ML-SAT iteration 3
MMIE +ML-SAT(iter 3)+D-SAT(iter 1)
23.8
38.2
MMIE +ML-SAT(iter 3)+D-SAT(iter 2)
23.6
37.9
MMIE +ML-SAT(iter 3)+D-SAT(iter 3)
23.6
37.8
MMIE +ML-SAT(iter 3)+D-SAT(iter 4)
23.4
37.8
Speed measured on dual cpu 1.2 GHz Athlon processors with 1GB RAM

 
 
 
 
 

The Center for Language and Speech Processing
The Johns Hopkins University
3400 North Charles Street, Barton Hall
Baltimore, MD 21218
* Telephone: (410) 516-4237 * Fax: (410) 516-5050 * E-mail: clsp@clsp.jhu.edu