Vector-based Models of Semantic Composition – Mirella Lapata (University of Edinburgh)

November 3, 2009 all-day

Vector-based models of word meaning have become increasingly popular in natural language processing and cognitive science. The appeal of these models lies in their ability to represent meaning simply by using distributional information under the assumption that words occurring within similar contexts are semantically similar. Despite their widespread use, vector-based models are typically directed at representing words in isolation and methods for constructing representations for phrases or sentences have received little attention in the literature.In this talk, we propose a framework for representing the meaning of word combinations in vector space. Central to our approach is vector composition which we operationalize in terms of additive and multiplicative functions. Under this framework, we introduce a wide range of composition models which we evaluate empirically on a phrase similarity task. We also propose a novel statistical language model that is based on vector composition and can capture long-range semantic dependencies.Joint work with Jeff Mitchell.
Mirella Lapata is a reader (US equivalent to associate professor) in the School of Informatics at the University of Edinburgh. Her research interests are in natural language processing focusing on semantic interpretation and generation. She obtained a PhD degree in Informatics from the University of Edinburgh in 2001 and spent two years as faculty member at the Department of Computer Science at the University of Sheffield. She received a B.A. degree in computer science from the University of Athens in 1994 and an Msc degree from Carnegie Mellon University in 1998.

Center for Language and Speech Processing