Morphological Disambiguation and Tagging By Voting Constraints – Kemal Oflazer (Bilkent University)

March 31, 1997 all-day

This talk presents a constraint-based approach to morphological disambiguation and tagging in which individual constraints vote on matching morphological parses, or sequences of parses, and disambiguation of all the tokens in a sentence is performed at the very end, by selecting parses or sequences of parses that receive the highest votes. This constraint application paradigm makes the outcome of the disambiguation independent of the rule sequence, and hence relieves the rule developer from worrying about potentially conflicting rule sequencing found in other systems.We have applied our approach to both Turkish and English. For Turkish, a language with complex agglutinative word structures, displaying rather different types of morphological ambiguity not found in languages like English, we have used parse voting and with about 500 constraint rules and some additional simple statistics, we have attained a recall of 95-96% and a precision of 94-95% with about 1.01 parses per token. We have recently applied path voting to tagging English where constraints efficiently vote on all possible matching sequences of tags, and have obtained quite similar results.Our current implementations are prototypes and we outline an efficient implementation technique using finite state transducers and transducer composition.

Center for Language and Speech Processing