Pronunciation Modeling of Mandarin Casual Speech

Research Group of the 2000 Summer Workshop

When people speak casually in daily life, they are not consistent in their pronunciation. In listening to such casual speech, it is quite common to find many different pronunciations of individual words. Current automatic speech recognition systems can reach a word accuracies above 90% when evaluated on carefully produced standard speech, but in recognizing casual, unplanned speech, performance drops to 75% or even lower. There are many reasons for this. In casual speech, one phoneme can shift to another. In Mandarin for example, the initial /sh/ in “wo shi (I am)” is often pronounced weakly and shifts into an /r/. In some other cases, sounds are dropped. In Mandarin, phonemes such as /b/, /p/, /d/, /t/, and /k/ are often reduced and as a result are often recognized as silence. These problems are made especially severe in Mandarin casual speech since most Chinese are non-native Mandarin speakers. Chinese languages such as Cantonese are as different from the standard Mandarin as French is different from English. As a result, there is an even larger pronunciation variation due to the influence of speakers’ native language.

We propose to study and model such pronunciation differences in casual speech using interviews found in Mandarin news broadcasts. We hope to include experienced researchers from both China and the US in the areas of pronunciation modeling, Mandarin speech recognition, and Chinese phonology.

Final Presentation Video

Team Members
Senior Members
William Byrne	CLSP/JHU
Pascale Fung	HKUST
Terri Kamm	Department of Defense
Tom Zheng	Tsinghua University
Graduate Students
Zhanjiang Song	Tsinghua University
Veera Venkatramani	CLSP/JHU
Liu Yi	KHUST
Undergraduate Students
Umar Ruhi	University of Toronto

Pronunciation Modeling of Mandarin Casual Speech

Upcoming Seminars

Center for Language and Speech Processing