Semantic Analysis Over Sparse Data

The aim of the task is to verify the feasibility of a machine learning-based semantic approach to the data sparseness problem that is encountered in many areas of natural language processing such as language modeling, text classification, question answering and information extraction.
The suggested approach takes advantage of several technologies for supervised and unsupervised sense disambiguation that have been developed in the last decade and of several resources that have been made available.

The task is motivated by the fact that current language processing models are considerably affected by sparseness of training data, and current solutions, like class-based approaches, do not elicit appropriate information: the semantic nature and linguistic expressiveness of automatically derived word classes is unclear. Many of these limitations originate from the fact that fine-grained automatic sense disambiguation is not applicable on a large scale.

The workshop will develop a weakly supervised method for sense modeling (i.e. reduction of possible word senses in corpora according to their genre) and apply it to a huge corpus in order to coarsely sense-disambiguate it. This can be viewed as an incremental step towards fine-grained sense disambiguation. The created semantic repository as well as the developed techniques will be made available as resources for future work on language modeling, semantic acquisition for text extraction, question answering, summarization, and most other natural language processing tasks.

 

Team Members
Senior Members
Roberto BasiliUniversity of Rome
Kalina BontchevaUniversity of Sheffield
Hamish CunnignhamUniversity of Sheffield
Louise GuthrieUniversity of Sheffield
Fabio ZanzottoUniversity of Rome
Graduate Students
Jia CuiJHU
David GuthrieUniversity of Sheffield
Jerry LiuColumbia
Klaus MachereyUniversity of Aachen
Undergraduate Students
Kristiyan HaralambievUniversity of Sofia
Cassia MartinHarvard
Affiliate Members
Marco CammisaUniversity of Rome
Martin HolubCharles University

Center for Language and Speech Processing