Novel Speech Recognition Models for Arabic

Previous research on large-vocabulary automatic speech recognition (ASR) has mainly concentrated on European and Asian languages. Other language groups have been explored to a lesser extent, for instance Semitic languages like Hebrew and Arabic. These languages possess certain characteristics which present problems for standard ASR systems. For example, their written representation does not contain most of the vowels present in the spoken form, which makes it difficult to utilize textual training data. Furthermore, they have a complex morphological structure, which is characterized not only by a high degree of affixation but also by the interleaving of vowel and consonant patterns (so-called “non-concatenative morphology”). This leads to a large number of possible word forms, which complicates the robust estimation of statistical language models.

In this workshop group we aim to develop new modeling approaches to address these and related problems, and to apply them to the task of conversational Arabic speech recognition. We will develop and evaluate a multi-linear language model, which decomposes the task of predicting a given word form into predicting more basic morphological patterns and roots. Such a language model can be combined with a similarly decomposed acoustic model, which necessitates new decoding techniques based on modeling statistical dependencies between loosely coupled information streams. Since one pervading issue in language processing is the tradeoff between language-specific and language-independent methods, we will also pursue an alternative control approach which relies on the capabilities of existing, language-independent recognition technology. Under this approach no mophological analysis will be performed and all word forms will be treated as basic vocabulary units. Furthermore, acoustic model topologies will be used which specify short vowels as optional rather than obligatory elements, in order to facilitate the use of text documents as language model training data. Finally, we will investigate the possibility of using large, generally available text and audio sources to improve the accuracy of conversational Arabic speech recognition.

Visit original website
Final Report [PDF]


Team Members 
Senior Members
Jeff BilmesUniversity of Washington
John HendersonMITRE
Katrin KirchhoffUniversity of Washington
Pat SchoneDoD
Rich SchwartzBBN Technologies
Graduate Students
Sourin DasJHU
Gang JiUniversity of Washington
Mohamed NoamanyBBN Technologies
Undergraduate Students
Melissa EganPomona College
Feng HeSwarthmore College

Johns Hopkins University

Johns Hopkins University, Whiting School of Engineering

Center for Language and Speech Processing
Hackerman 226
3400 North Charles Street, Baltimore, MD 21218-2680

Center for Language and Speech Processing