Translingual Information Processing

Salim Roukos, IBM TJ Watson Research

March 29, 2005


Abstract

Searching unstructured information in the form of _LP_largely_RP_ text with increasing image, audio, and video content is fast becoming a daily activity for many people. Increasingly, the content is becoming multilingual _LP_e.g. one such trend is that non-english speakers became the majority of online users in the summer of 2001 and continue to increase their share reaching two-thirds today_RP_. To help assist users with accessing answers to their information needs regardless of the original language of the relevant content, we at IBM Research have a number of projects to handle multilingual content ranging from machine translation, information extraction, to topic detection and tracking. In this talk, we will present an overview of our work on statistical machine translation and demonstrate a cross-lingual search engine to search Arabic content using English queries.