Dialectal Chinese Speech Recognition

There are eight major dialectal regions in addition to Mandarin (Northern China) in China, including Wu (Southern Jiangsu, Zhejiang, and Shanghai), Yue (Guangdong, Hong Kong, Nanning Guangxi), Min (Fujian, Shantou Guangdong, Haikou Hainan, Taipei Taiwan), Hakka (Meixian Guangdong, Hsin-chu Taiwan), Xiang (Hunan), Gan (Jiangxi), Hui (Anhui), and Jin (Shanxi). These dialects can be further divided into more than 40 sub-categories. Although the Chinese dialects share a written language and standard Chinese (Putonghua) is widely spoken in most regions, speech is still strongly influenced by the native dialects. This great linguistic diversity poses problems for automatic speech and language technology. Automatic speech recognition relies to a great extent on the consistent pronunciation and usage of words within a language. In Chinese, word usage, pronunciation, and syntax and grammar vary depending on the speaker’s dialect. As a result speech recognition systems constructed to process standard Chinese (Putonghua) perform poorly for the great majority of the population.

The goal of our summer project is to develop a general framework to model phonetic, lexical, and pronunciation variability in dialectal Chinese automatic speech recognition tasks. The baseline system is a standard Chinese recognizer. The goal of our research is to find suitable methods that employ dialect-related knowledge and training data (in relatively small quantities) to modify the baseline system to obtain a dialectal Chinese recognizer for the specific dialect of interest. For practical reasons during the summer, we will focus on one specific dialect, for example the Wu dialect or the Chuan dialect. However the techniques we intend to develop should be broadly applicable.

Our project will build on established ASR tools and systems developed for standard Chinese. In particular, our previous studies in pronunciation modeling have established baseline Mandarin ASR systems along with their component lexicons and language model collections. However, little previous work or resources are available to support research in Chinese dialect variation for ASR. Our pre-workshop will therefore focus on further infrastructure development:

  • Dialectal Lexicon Construction. We will establish an electronic dialect dictionary for the chosen dialect. The lexicon will be constructed to represent both standard and dialectal pronunciations.
  • Dialectal Chinese Database Collection. We will set up a dialectal Chinese speech database with canonical pinyin level and dialectal pinyin level transcriptions. The database could contain two parts: read speech and spontaneous speech. For the spontaneous speech part, the generalized initial/final (GIF) level transcription should be also included.

Our effort at the workshop will be to employ these materials to develop ASR system components that can be adapted from standard Chinese to the chosen dialect. Emphasis will be placed on developing techniques that work robustly with relatively small (or even no) dialect data. Research will focus primarily on acoustic phenomena, rather than syntax or grammatical variation, which we intend to pursue after establishing baseline ASR experiments.

Opening Day Presentation (July 6, 2004)

Progress Report (July 28, 2004)

Final Presentation (August 15, 2004)

Final Report

 

Team Members
Senior Members
Liang GuIBM
Dan JurafskyStanford University
Izhak ShafranJohns Hopkins University
Richard SproatUniversity of Illinois
Feng (Thomas) ZhangTsinghua University
Graduate Students
Jing LiTsinghua University
Yi SuJohns Hopkins University
Stavros TsakalidisJohns Hopkins University
Yanli ZhangUniversity of Illinois
Haolang ZhouJohns Hopkins University
Undergraduate Students
Philip BramsenMIT
David KirschLehigh University

Center for Language and Speech Processing