Translingual Information Processing – Salim Roukos (IBM TJ Watson Research)
Searching unstructured information in the form of largely text with increasing image, audio, and video content is fast becoming a daily activity for many people. Increasingly, the content is becoming multilingual e.g. one such trend is that non-English speakers became the majority of online users in the summer of 2001 and continue to increase their share reaching two-thirds today. To help assist users with accessing answers to their information needs regardless of the original language of the relevant content, we at IBM Research have a number of projects to handle multilingual content ranging from machine translation, information extraction, to topic detection and tracking. In this talk, we will present an overview of our work on statistical machine translation and demonstrate a cross-lingual search engine to search Arabic content using English queries.