Pronunciation Modelling

Research Group of the 1997 Summer Workshop

Our goal is to model the extensive pronunciation variation found in the Switchboard corpus, likely an important factor in the difficulty current ASR systems have on this conversational speech task. In contrast to previous efforts, we will use the recently created ICSI hand-labeled phonetic transcriptions of Switchboard as the target data of our modeling. This new corpus potentially contains a wealth of information about pronunciation in conversational speech. We will use relevant phonological, prosodic, syntactic, and discourse information as the source data of our modeling including baseform pronunciation of words, lexical stress, pitch accent, and segmental durations. We will map from source to target by various stochastic and rule-based methods including statistical decision trees, rewrite rules, and MMI. The initial measure of performance will be the reduction of the conditional entropy of the target ICSI transcriptions given the source linguistic information. Next, these mappings will be used in a speech recognizer to create alternative pronunciations in context and word error rate will be measured. As time permits, the pronunciation models created above will be used to transcribe automatically a portion of the speech corpus and then the acoustic models will be re-estimated based on these transcriptions. We will also explore generating constrained automatic alignments of all of the data as an alternative to the ICSI data.

Team Members
Senior Members
Sanjeev Khudanpur	CLSP
Bill Byrne	CLSP/JHU
Michael Riley	AT&T Labs
Chuck Wooters	DoD
George Zavaliagkos	BBN
Graduate Students
Murat Saraçlar	CLSP
Michael Fink	CMU
Harriet Nock	Cambridge

Pronunciation Modelling

Upcoming Seminars

Center for Language and Speech Processing