DIMITRA VERGYRI
RESEARCH STATEMENT
I am currently a doctoral candidate in Electrical and Computer Engineering
at The Johns Hopkins University and expect to complete my Ph.D. degree
in May 2000. For the past five years I have been working with the Center
for Language and Speech Processing. My thesis work ``Integration of Multiple
Knowledge Sources in Speech Recognition using Minimum Error Training" has
been supervised by Prof. Frederick Jelinek. The general area of my research
interests spans acoustic and language modeling, speech recognition, statistical
modeling, information theory and statistics.
My thesis research addressed the problem of optimal combination of available
models with the use of discriminative objective functions.
In the standard formulation of the speech recognition problem, two models
are used to score the sentence hypotheses: the acoustic model and the language
model. These are developed independently, and combined using a static parameter
for scaling the scores of one of the models relative to the other.
In my thesis a general formulation is presented for combining several
model scores in a log-linear model that computes the hypothesis likelihood.
The model combination can either be performed in a static way, with constant
parameters, or in a dynamic way, where the parameters may vary for different
segments of a hypothesis. The aim is to optimize the parameters so as to
achieve minimum word error rate. In the dynamic combination case,
in order to achieve robust parameter estimation, the parameters are defined
to be piecewise constant on different classes that form a partition on
the space of the hypotheses segments.
The approach is used in three different applications:
-
The first concerns the combination of available acoustic models in order
to obtain a recognition system for a language with sparse acoustic training
data: The acoustic models provided were trained using speech from different
languages for which training data is in abundance, and were adapted to
the target language using a small amount of data. These models are combined,
in both a static and a dynamic way, in order to improve the recognition
accuracy on a held out set of the target language. The partition for the
dynamic combination is defined using phonological knowledge for segments
that correspond to hypothesized phones.
-
The second application is the dynamic combination of the baseline acoustic
and language models. Different ways were explored for defining a partition
on the hypothesized tokens, on which the parameters are defined. For that
purpose features commonly used to predict confidence (correctness)
were utilized. This is a simple approach towards Acoustic Sensitive
Language Modeling, since it combines the two models with the goal to
use the language model more aggressively when the acoustic model is not
reliable, and wise versa. Even though the two models are still developed
independently they may take each other into consideration in order to provide
the hypothesis score.
-
The third application integrates in the model scores available as side
information after a first recognition pass. Most of these scores were used
as features to provide confidence measures in the previous approach. The
model, defined as a static combination of these scores, is used to rescore
the obtained hypotheses.
Different objective functions, aiming to discriminate between the best
available hypothesis and the confusable ones, where employed for training
the parameters of the model.
The task of training the model with an objective function matched to
the goal of minimum number of errors, is still very much in my research
interests, irrespective of whether such a function is used from the beginning
to train the parameters of the models, or is used in a Discriminative Model
Combination framework to combine a set of trained using Maximum Likelihood
techniques.
Within the same framework, I am still interested in pursuing the idea
of training (or dynamically modifying) the language model in order to disambiguate
among acoustically confusable words. In the application examined in my
thesis as well as in similar approaches in the literature, the problem
was addressed, but is still far from being solved.
I would also be interested in working on other applications in the general
area of statistical pattern recognition. Primarily, I am looking for a
position that will allow me to do research in this area at an industrial
research laboratory. I am also interested in the idea of joining a product
development group that employs statistical methods for solving ``real world''
problems.
Dimitra Vergyri
Wed Apr 26 21:42:05 EDT 2000