Novel Speech Recognition Models for Arabic

Research Group of the 2002 Summer Workshop

Previous research on large-vocabulary automatic speech recognition (ASR) has mainly concentrated on European and Asian languages. Other language groups have been explored to a lesser extent, for instance Semitic languages like Hebrew and Arabic. These languages possess certain characteristics which present problems for standard ASR systems. For example, their written representation does not contain most of the vowels present in the spoken form, which makes it difficult to utilize textual training data. Furthermore, they have a complex morphological structure, which is characterized not only by a high degree of affixation but also by the interleaving of vowel and consonant patterns (so-called “non-concatenative morphology”). This leads to a large number of possible word forms, which complicates the robust estimation of statistical language models.

In this workshop group we aim to develop new modeling approaches to address these and related problems, and to apply them to the task of conversational Arabic speech recognition. We will develop and evaluate a multi-linear language model, which decomposes the task of predicting a given word form into predicting more basic morphological patterns and roots. Such a language model can be combined with a similarly decomposed acoustic model, which necessitates new decoding techniques based on modeling statistical dependencies between loosely coupled information streams. Since one pervading issue in language processing is the tradeoff between language-specific and language-independent methods, we will also pursue an alternative control approach which relies on the capabilities of existing, language-independent recognition technology. Under this approach no mophological analysis will be performed and all word forms will be treated as basic vocabulary units. Furthermore, acoustic model topologies will be used which specify short vowels as optional rather than obligatory elements, in order to facilitate the use of text documents as language model training data. Finally, we will investigate the possibility of using large, generally available text and audio sources to improve the accuracy of conversational Arabic speech recognition.

Visit original website
Final Report [PDF]

Team Members
Senior Members
Jeff Bilmes	University of Washington
John Henderson	MITRE
Katrin Kirchhoff	University of Washington
Pat Schone	DoD
Rich Schwartz	BBN Technologies
Graduate Students
Sourin Das	JHU
Gang Ji	University of Washington
Mohamed Noamany	BBN Technologies
Undergraduate Students
Melissa Egan	Pomona College
Feng He	Swarthmore College

Novel Speech Recognition Models for Arabic

Upcoming Seminars

Center for Language and Speech Processing