Sense Tagging the Penn TreeBank
Martha Stone Palmer, University of Pennsylvania
April 18, 2000
As our national interests become increasingly global, timely access to information in other languages becomes more and more necessary. This can only be provided efficiently through the use of automated or semi-automated information processing technology. Computational lexical semantics plays a critical role in multilingual information processing, especially machine translation where it is essential for accurate lexical choice. This talk will give examples of difficult translation choices, and demonstrate how the use of cross-linguistic semantic components based on lexical generalizations can lead to accurate predictions. It will introduce Levin classes and a refinement of them, Intersective Levin classes, and show how the closely coupled syntactic frames and semantic components they supply provide a methodology for defining regular sense extensions. The regular sense extensions exemplified by the syntactic adjunctions supply concrete criteria for sense distinctions, which provide a basis for VerbNet, a public domain lexical resource that is being used for semantic annotation of on-line corpora, including the Penn TreeBank. The approach and methodology that is being used to associate WordNet and VerbNet sense tags as well as predicate-argument structures with the Penn TreeBank will be described.
Professor Martha Palmer is in the Computer and Information Sciences Department of the University of Pennsylvania, as well as the Institute for Research in Cognitive Science. She has been actively involved in research in Natural Language Processing and Knowledge Representation for over twenty years, beginning with her graduate work at the University of Edinburgh on the use of Lexical Conceptual Structures as predicate argument structures for driving the semantic interpretation process. This lexically based semantic interpretation process was continued during a postdoc at the University of Pennsylvania and formed the basis of the successful DARPA-funded text processing system, Pundit, which she built at Unisys during the 80's. This system integrated semantic and pragmatic processing in innovative ways that enabled sophisticated reference resolution and temporal analysis. During her three year visit to the National University of Singapore she began applying these same techniques to the task of English to Chinese Machine Translation, and has continued this research since returning to Philadelphia and the University of Pennsylvania in 1993. She is now broadening her interest in lexical semantics to include cross-linguistic verb classifications and is involved in building a Chinese TreeBank for the Department of Defense and a Korean/English Machine Translation system for the US Army. She is currently a member of the Advisory Committee for the DARPA TIDES program, (Trans-lingual Information Detection, Extraction and Summarization), the coordinator for American involvement in EAGLES, the international Expert Advisory Group on Language Engineering Standards, and is Chair of SIGLEX, the Special Interest Group on the Lexicon. She was previously on the Executive Committee of the Association of Machine Translation for the Americas and the Executive Committee of the Association of Computational Linguistics, as well as Co-Program Chair of ACL-96.