Two Topics in Automatic Speech Recognition – Mukund Padmanabhan (IBM TJ Watson Research Center)
PartI – Decision-tree based quantization of the feature space of a classifierWe describe the design and use of a decision tree to quantize the input feature space of a classifier. The decision tree asks questions about a multi-dimensional feature vector, with the questions being designed at every stage of the tree growing process, rather than being picked from a predetermined set of questions. The quantization information provided by the decision tree is used to eliminate a number of classes from being considered, and hence simplifies the task of the classifier. We show that computation in a speech recognition system can be reduced by a factor of 20 with negligible degradation in classification accuracy by using such trees in preprocessing the acoustic features.Part II – Speaker clustering and transformation for adaptation in ASRWe describe a speaker adaptation strategy is described that is based on first finding a subset of training speakers who are acoustically close to the test speaker. A linear transformation is computed for each selected training speaker to better map the training speaker’s data to the test speaker’s acoustic space. The system parameters (Gaussian means) are then re-estimated for the test speaker using the transformed training data from only the selected training speakers. Experiments show that this scheme is capable of providing relative improvements in the error rate of 18% on a large-vocabulary task with the use of as little as 3 sentences of adaptation data from the test speaker.