Semantic Analysis Over Sparse Data

Research Group of the 2003 Summer Workshop

The aim of the task is to verify the feasibility of a machine learning-based semantic approach to the data sparseness problem that is encountered in many areas of natural language processing such as language modeling, text classification, question answering and information extraction.
The suggested approach takes advantage of several technologies for supervised and unsupervised sense disambiguation that have been developed in the last decade and of several resources that have been made available.

The task is motivated by the fact that current language processing models are considerably affected by sparseness of training data, and current solutions, like class-based approaches, do not elicit appropriate information: the semantic nature and linguistic expressiveness of automatically derived word classes is unclear. Many of these limitations originate from the fact that fine-grained automatic sense disambiguation is not applicable on a large scale.

The workshop will develop a weakly supervised method for sense modeling (i.e. reduction of possible word senses in corpora according to their genre) and apply it to a huge corpus in order to coarsely sense-disambiguate it. This can be viewed as an incremental step towards fine-grained sense disambiguation. The created semantic repository as well as the developed techniques will be made available as resources for future work on language modeling, semantic acquisition for text extraction, question answering, summarization, and most other natural language processing tasks.

Team Members
Senior Members
Roberto Basili	University of Rome
Kalina Bontcheva	University of Sheffield
Hamish Cunnignham	University of Sheffield
Louise Guthrie	University of Sheffield
Fabio Zanzotto	University of Rome
Graduate Students
Jia Cui	JHU
David Guthrie	University of Sheffield
Jerry Liu	Columbia
Klaus Macherey	University of Aachen
Undergraduate Students
Kristiyan Haralambiev	University of Sofia
Cassia Martin	Harvard
Affiliate Members
Marco Cammisa	University of Rome
Martin Holub	Charles University

Semantic Analysis Over Sparse Data

Upcoming Seminars

Center for Language and Speech Processing