Neural Paraphrase Models – Kevin Gimpel (Toyota Technological Institute at Chicago)

April 21, 2015 all-day

I will discuss the problem of automatically determining whether two short phrases are paraphrastic. Several neural models will be presented, all of which are trained on examples drawn from the Paraphrase Database (PPDB; Ganitkevitch et al, 2013), an extensive resource consisting of a list of paraphrastic phrase pairs with confidence estimates. To evaluate short-phrase paraphrase models, we propose two newly-annotated datasets and report initial baseline results with a variety of standard approaches and neural architectures. One such dataset is a subset of PPDB and can be used to assess the quality of its confidence estimates. We find that we can use pairs from PPDB to train models that score paraphrase pairs more accurately than the PPDB’s confidences, and also achieve state-of-the-art results on standard word and bigram similarity datasets.
This is joint work with John Wieting (UIUC), Mohit Bansal (TTIC), and Karen Livescu (TTIC).
Kevin Gimpel is a research assistant professor at the Toyota Technological Institute at Chicago, a philanthropically-endowed academic computer science institute located on the University of Chicago campus. He received his PhD in 2012 from the Language Technologies Institute at Carnegie Mellon University, where he was advised by Noah Smith. His research focuses on natural language processing, including applications like machine translation, speech recognition, and social media analysis. He also works on machine learning motivated by NLP, especially learning criteria for supervised and unsupervised structured prediction. His PhD research was partially supported by an Achievement Rewards for College Scientists Scholarship and a Sandia National Laboratories Excellence in Science and Technology Fellowship, andhe received a five-year retrospective best paper award for a paper at WMT 2008.

Center for Language and Speech Processing