Domain Adaptation in Natural Language Processing – Hal Daume (University of Utah)

March 3, 2009 all-day

Supervised learning technology has led to systems for part of speech tagging, parsing, named entity recognition with accuracies in the high 90%s. Unfortunately, the performance of these systems degrades drastically when they are applied on text outside their training domain (typically, newswire). Machine translation systems work fantastically for translating Parliamentary proceedings, but fall down when applied to alternate domains. I’ll discuss research that aims to understand what goes wrong when models are applied outside their domain, and some (partial) solutions to this problem. I’ll focus on named entity recognition and machine translation tasks, where we’ll see a range of different sources of error (some of which are quite counter-intuitive!).
Hal Daume is an assistant professor in the School of Computing at the University of Utah. His primary research interests are in Bayesian learning, structured prediction and domain adaptation (with a focus on problems in language and biology). He earned his PhD at the University of Southern Californian with a thesis on structured prediction for language (his advisor was Daniel Marcu). He spent the summer of 2003 working with Eric Brill in the machine learning and applied statistics group at Microsoft Research. Prior to that, he studied math (mostly logic) at Carnegie Mellon University. He still likes math and doesn’t like to use C (instead he uses O’Caml or Haskell). He doesn’t like shoes, but does like activities that are hard on your feet: skiing, badminton, Aikido and rock climbing.

Center for Language and Speech Processing