Dialectal Chinese Speech Recognition

Research Group of the 2004 Summer Workshop

There are eight major dialectal regions in addition to Mandarin (Northern China) in China, including Wu (Southern Jiangsu, Zhejiang, and Shanghai), Yue (Guangdong, Hong Kong, Nanning Guangxi), Min (Fujian, Shantou Guangdong, Haikou Hainan, Taipei Taiwan), Hakka (Meixian Guangdong, Hsin-chu Taiwan), Xiang (Hunan), Gan (Jiangxi), Hui (Anhui), and Jin (Shanxi). These dialects can be further divided into more than 40 sub-categories. Although the Chinese dialects share a written language and standard Chinese (Putonghua) is widely spoken in most regions, speech is still strongly influenced by the native dialects. This great linguistic diversity poses problems for automatic speech and language technology. Automatic speech recognition relies to a great extent on the consistent pronunciation and usage of words within a language. In Chinese, word usage, pronunciation, and syntax and grammar vary depending on the speaker’s dialect. As a result speech recognition systems constructed to process standard Chinese (Putonghua) perform poorly for the great majority of the population.

The goal of our summer project is to develop a general framework to model phonetic, lexical, and pronunciation variability in dialectal Chinese automatic speech recognition tasks. The baseline system is a standard Chinese recognizer. The goal of our research is to find suitable methods that employ dialect-related knowledge and training data (in relatively small quantities) to modify the baseline system to obtain a dialectal Chinese recognizer for the specific dialect of interest. For practical reasons during the summer, we will focus on one specific dialect, for example the Wu dialect or the Chuan dialect. However the techniques we intend to develop should be broadly applicable.

Our project will build on established ASR tools and systems developed for standard Chinese. In particular, our previous studies in pronunciation modeling have established baseline Mandarin ASR systems along with their component lexicons and language model collections. However, little previous work or resources are available to support research in Chinese dialect variation for ASR. Our pre-workshop will therefore focus on further infrastructure development:

Dialectal Lexicon Construction. We will establish an electronic dialect dictionary for the chosen dialect. The lexicon will be constructed to represent both standard and dialectal pronunciations.
Dialectal Chinese Database Collection. We will set up a dialectal Chinese speech database with canonical pinyin level and dialectal pinyin level transcriptions. The database could contain two parts: read speech and spontaneous speech. For the spontaneous speech part, the generalized initial/final (GIF) level transcription should be also included.

Our effort at the workshop will be to employ these materials to develop ASR system components that can be adapted from standard Chinese to the chosen dialect. Emphasis will be placed on developing techniques that work robustly with relatively small (or even no) dialect data. Research will focus primarily on acoustic phenomena, rather than syntax or grammatical variation, which we intend to pursue after establishing baseline ASR experiments.

Opening Day Presentation (July 6, 2004)

Progress Report (July 28, 2004)

Final Presentation (August 15, 2004)

Final Report

Team Members
Senior Members
Liang Gu	IBM
Dan Jurafsky	Stanford University
Izhak Shafran	Johns Hopkins University
Richard Sproat	University of Illinois
Feng (Thomas) Zhang	Tsinghua University
Graduate Students
Jing Li	Tsinghua University
Yi Su	Johns Hopkins University
Stavros Tsakalidis	Johns Hopkins University
Yanli Zhang	University of Illinois
Haolang Zhou	Johns Hopkins University
Undergraduate Students
Philip Bramsen	MIT
David Kirsch	Lehigh University

Dialectal Chinese Speech Recognition

Upcoming Seminars

Center for Language and Speech Processing