The aim of the task is to verify the feasibility of a machine learning-based semantic approach to the data sparseness problem that is encountered in many areas of natural language processing such as language modeling, text classification, question answering and information extraction.
The suggested approach takes advantage of several technologies for supervised and unsupervised sense disambiguation that have been developed in the last decade and of several resources that have been made available.
The task is motivated by the fact that current language processing models are considerably affected by sparseness of training data, and current solutions, like class-based approaches, do not elicit appropriate information: the semantic nature and linguistic expressiveness of automatically derived word classes is unclear. Many of these limitations originate from the fact that fine-grained automatic sense disambiguation is not applicable on a large scale.
The workshop will develop a weakly supervised method for sense modeling (i.e. reduction of possible word senses in corpora according to their genre) and apply it to a huge corpus in order to coarsely sense-disambiguate it. This can be viewed as an incremental step towards fine-grained sense disambiguation. The created semantic repository as well as the developed techniques will be made available as resources for future work on language modeling, semantic acquisition for text extraction, question answering, summarization, and most other natural language processing tasks.
|Roberto Basili||University of Rome|
|Kalina Bontcheva||University of Sheffield|
|Hamish Cunnignham||University of Sheffield|
|Louise Guthrie||University of Sheffield|
|Fabio Zanzotto||University of Rome|
|David Guthrie||University of Sheffield|
|Klaus Macherey||University of Aachen|
|Kristiyan Haralambiev||University of Sofia|
|Marco Cammisa||University of Rome|
|Martin Holub||Charles University|