Domain Adaptation in NLP – Hal Daume (University of Maryland)
View Seminar Video
The Wall Street Journal doesn’t look like medical texts, which in turn don’t look like tweets. We shouldn’t expect statistical models trained on news to do well on other domains, and indeed they don’t. The problem of moving a statistical model from one training domain to a different (set of) test domain(s) is the task of domain adaptation. I will discuss two algorithms for domain adaptation: one that works in a batch fashion, and one that works online. The online algorithm naturally adapts to an active setting wherein you can periodically query a human for the labels of data points in the new domains. In both cases I will present some theoretical results that quantify the amount of data necessary to learn (with high probability). This is joint work with Avishek Saha, Abhishek Kumar and Piyush Rai.
Hal Daume III is an assistant professor of Computer Science at the University of Maryland, College Park. He previously held a position in School of Computing at the University of Utah. His primary research interests are in understanding how to get human knowledge into a machine learning system in the most efficient way possible. In practice, he works primarily in the areas of Bayesian learning (particularly non-parametric methods), structured prediction and domain adaptation (with a focus on problems in language and biology). He associates himself most with conferences like ACL, ICML, NIPS and EMNLP. He earned his PhD at the University of Southern Californian with a thesis on structured prediction for language (his advisor was Daniel Marcu). He spent the summer of 2003 working with Eric Brill in the machine learning and applied statistics group at Microsoft Research. Prior to that, he studied math (mostly logic) at Carnegie Mellon University. He still likes math and doesn’t like to use C (instead he uses O’Caml or Haskell). He doesn’t like shoes, but does like activities that are hard on your feet: skiing, badminton, Aikido and rock climbing.