Mandarin-English Information

Our globally interconnected world increasingly demands technologies to support on-demand retrieval of relevant information in any medium and in any language. If we search the web for, say, the loss of life in an earthquake in Turkey, by entering keywords in English, the most relevant stories are likely to be in Turkish or even Greek. Furthermore, the latest information may be in the form of audio files of the evening’s news. One would like to be able to firstly find such information and then to translate it to English. Finding such information is beyond the capabilities of most commercially available search engines; good automatic translation is even harder. In this project, we will extend the state-of-the-art for searching audio and on-line text in one language for a user who speaks another language.

A very large corpus of concurrent Mandarin and English textual and spoken news stories is available for conducting such research. These textual and spoken documents in both languages will be automatically indexed; in case of spoken documents, this will involve automatic speech recognition. Given a query in either language, we will then investigate systems that retrieve relevant documents in both languages for the user. Such cross-lingual and cross-media (CLCM) information retrieval is a novel problem with many technical challenges. Several schemes for recognizing the audio, indexing the text, and for estimating translation models to match queries in one language with documents in another language will be investigated in the summer. Applications of this research include audio and video browsing, spoken document retrieval, automated routing of information, and automatically alerting the user when special events occur.

 

Team Members 
Senior Members
Sanjeev KhudanpurCLSP
Erika GramsAdvanced Analytic Tools
Gina-Anne LevowUniversity of Maryland
Helen MengCUHK
Douglas OardUniversity of Maryland
Patrick SchoneDepartment of Defense
Hsin-Min WangAcademia Sinica, Taiwan
Graduate Students
Berlin ChenAcademia Sinica, Taiwan
Wai-Kit LoCUHK
Jianqiang WangUniversity of Maryland
Undergraduate Students
Karen TangPrinceton University

Center for Language and Speech Processing