There are eight major dialectal regions in addition to Mandarin (Northern China) in China, including Wu (Southern Jiangsu, Zhejiang, and Shanghai), Yue (Guangdong, Hong Kong, Nanning Guangxi), Min (Fujian, Shantou Guangdong, Haikou Hainan, Taipei Taiwan), Hakka (Meixian Guangdong, Hsin-chu Taiwan), Xiang (Hunan), Gan (Jiangxi), Hui (Anhui), and Jin (Shanxi). These dialects can be further divided into more than 40 sub-categories. Although the Chinese dialects share a written language and standard Chinese (Putonghua) is widely spoken in most regions, speech is still strongly influenced by the native dialects. This great linguistic diversity poses problems for automatic speech and language technology. Automatic speech recognition relies to a great extent on the consistent pronunciation and usage of words within a language. In Chinese, word usage, pronunciation, and syntax and grammar vary depending on the speaker’s dialect. As a result speech recognition systems constructed to process standard Chinese (Putonghua) perform poorly for the great majority of the population.
The goal of our summer project is to develop a general framework to model phonetic, lexical, and pronunciation variability in dialectal Chinese automatic speech recognition tasks. The baseline system is a standard Chinese recognizer. The goal of our research is to find suitable methods that employ dialect-related knowledge and training data (in relatively small quantities) to modify the baseline system to obtain a dialectal Chinese recognizer for the specific dialect of interest. For practical reasons during the summer, we will focus on one specific dialect, for example the Wu dialect or the Chuan dialect. However the techniques we intend to develop should be broadly applicable.
Our project will build on established ASR tools and systems developed for standard Chinese. In particular, our previous studies in pronunciation modeling have established baseline Mandarin ASR systems along with their component lexicons and language model collections. However, little previous work or resources are available to support research in Chinese dialect variation for ASR. Our pre-workshop will therefore focus on further infrastructure development:
Our effort at the workshop will be to employ these materials to develop ASR system components that can be adapted from standard Chinese to the chosen dialect. Emphasis will be placed on developing techniques that work robustly with relatively small (or even no) dialect data. Research will focus primarily on acoustic phenomena, rather than syntax or grammatical variation, which we intend to pursue after establishing baseline ASR experiments.
Opening Day Presentation (July 6, 2004)
Progress Report (July 28, 2004)
Final Presentation (August 15, 2004)
Team Members | |
---|---|
Senior Members | |
Liang Gu | IBM |
Dan Jurafsky | Stanford University |
Izhak Shafran | Johns Hopkins University |
Richard Sproat | University of Illinois |
Feng (Thomas) Zhang | Tsinghua University |
Graduate Students | |
Jing Li | Tsinghua University |
Yi Su | Johns Hopkins University |
Stavros Tsakalidis | Johns Hopkins University |
Yanli Zhang | University of Illinois |
Haolang Zhou | Johns Hopkins University |
Undergraduate Students | |
Philip Bramsen | MIT |
David Kirsch | Lehigh University |