Exploiting Lexical & Encyclopedic Resources For Entity Disambiguation

Research Group of the 2007 Summer Workshop

Entity disambiguation is the problem of determining whether two mentions of entities refer to the same object: e.g., trying to decide whether the entity called “Jim Clark” in one document is the same as the entity called “Jim Clark” in another document. To do this accurately, it is necessary to extract from these documents descriptions of these entities as exhaustive and accurate as possible. This in turn requires ‘tracking’ these entities in each document – identifying all or most of their mentions – and collecting their properties, particularily those that help the most to discriminate between individuals.

The goal of the workshop is to further the state of the art in entity disambiguation by developing better techniques for tracking entities and for extracting their properties. A particular focus will be improving entity tracking by using lexical and encyclopedic knowledge extracted both from structured lexical databases and from semi-strcutured repositories such as Wikipedia. Lack of such knowledge is one of the main problems with current entity tracking methods, which typically cannot detect that ‘the Packwood proposal’ and ‘the Packwood plan’ in the following example refer to the same entity.

[The Packwood proposal] would reduce the tax depending on how long an asset was held. It also would create a new IRA that would shield from taxation the appreciation on investments made for a wide variety of purposes, including retirement, medical expenses, first-home purchases and tuition.
A White House spokesman said President Bush is “generally supportive” of [the Packwood plan]

Methods to be used include text mining techniques (supervised and unsupervised) to extract object properties; better machine learning techniques to improve entity tracking (e.g., using tree kernels); methods for extracting knowledge from WordNet, semantic role labellers, and Wikipedia; and clustering methods for entity disambiguation.

ELERFED CDC Overview
Entity Disambiguation Scoring Metrics
SVMs and Kernels
Wiki
Full CDED
Slides
Introduction
Slides
Versley System – PDF

Team Members
Senior Members
Ron Artstein	University of Essex
David Day	MITRE
Jason Duncan	Department of Defense
Alessandro Moschitti	University of Trento
Massimo Poesio	Unversity of Essex and University of Trento
Xiaofeng Yang	Institute for Infocomm Research, Singapore
Graduate Students
Jason Smith	CLSP
Robert Hall	University of Massachussetts
Simone Ponzetto	EML Research
Yannick Versley	University of Tubingen
Michael Wick	University of Massachusetts
Undergraduate Students
Vladimir Eidelman	Columbia University
Alan Jern	University of California Los Angeles
Brett Shwom	New York University
Affiliate Members
Walter Daelmans	University of Antwerp
Claudio Giuliano	FBK-IRST
Janet Hitzeman	MITRE
Veronique Hoste	University of Antwerp
Emily Jamison	Ohio
Mijail Kabadjov	Edinburgh University
Gideon Mann	University of Massachusetts
Sameer Pradhan	BBN
Michael Strube	EML Research

Exploiting Lexical & Encyclopedic Resources For Entity Disambiguation

Upcoming Seminars

Center for Language and Speech Processing