Embracing Language Diversity: Unsupervised Multilingual Learning – Regina Barzilay (MIT)

September 22, 2009 all-day

View Seminar Video
For centuries, the deep connection between human languages has fascinated scholars, and driven many important discoveries in linguistics and anthropology. In this talk, I will show that this connection can empower unsupervised methods for language analysis. The key insight is that joint learning from several languages reduces uncertainty about the linguistic structure of each individual language. I will present multilingual generative unsupervised models for morphological segmentation, part-of-speech tagging, and parsing. In all of these instances we model the multilingual data as arising through a combination of language-independent and language-specific probabilistic processes. This feature allows the model to identify and learn from recurring cross-lingual patterns to improve prediction accuracy in each language. I will also discuss ongoing work on unsupervised decoding of ancient Ugaritic tablets using data from related Semitic languages. This is joint work with Benjamin Snyder, Tahira Naseem and Jacob Eisenstein.
Regina Barzilay is an associate professor in the Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory. Her research interests are in natural language processing. She is a recipient of the NSF Career Award, Microsoft Faculty Fellowship, and has been named as one of “Top 35 Innovators Under 35” by Technology Review Magazine. She received her Ph.D. in Computer Science from Columbia University in 2003 and spent a year as a postdoc at Cornell University. Regina received her M.S. in 1998 and B.A. in 1992, both from Ben-Gurion University, Israel.

Center for Language and Speech Processing