Mandarin-English Information

Research Group of the 2000 Summer Workshop

Our globally interconnected world increasingly demands technologies to support on-demand retrieval of relevant information in any medium and in any language. If we search the web for, say, the loss of life in an earthquake in Turkey, by entering keywords in English, the most relevant stories are likely to be in Turkish or even Greek. Furthermore, the latest information may be in the form of audio files of the evening’s news. One would like to be able to firstly find such information and then to translate it to English. Finding such information is beyond the capabilities of most commercially available search engines; good automatic translation is even harder. In this project, we will extend the state-of-the-art for searching audio and on-line text in one language for a user who speaks another language.

A very large corpus of concurrent Mandarin and English textual and spoken news stories is available for conducting such research. These textual and spoken documents in both languages will be automatically indexed; in case of spoken documents, this will involve automatic speech recognition. Given a query in either language, we will then investigate systems that retrieve relevant documents in both languages for the user. Such cross-lingual and cross-media (CLCM) information retrieval is a novel problem with many technical challenges. Several schemes for recognizing the audio, indexing the text, and for estimating translation models to match queries in one language with documents in another language will be investigated in the summer. Applications of this research include audio and video browsing, spoken document retrieval, automated routing of information, and automatically alerting the user when special events occur.

Final Presentation Video

Team Members
Senior Members
Sanjeev Khudanpur	CLSP
Erika Grams	Advanced Analytic Tools
Gina-Anne Levow	University of Maryland
Helen Meng	CUHK
Douglas Oard	University of Maryland
Patrick Schone	Department of Defense
Hsin-Min Wang	Academia Sinica, Taiwan
Graduate Students
Berlin Chen	Academia Sinica, Taiwan
Wai-Kit Lo	CUHK
Jianqiang Wang	University of Maryland
Undergraduate Students
Karen Tang	Princeton University

Mandarin-English Information

Upcoming Seminars

Center for Language and Speech Processing