Translingual Fine-grained Morphosyntactic Analysis
and its Application to Machine Translation

Abstract:

English and a small set of other languages have a wealth of available linguistic knowledge resources and annotated language data, but the great majority of the world's languages have little or none. This dissertation describes work which leverages the detailed and accurate morphosyntactic analyses available for English to improve analytical capabilities for a diverse set of other languages. This includes the targeted enrichment of English morphosyntactic analysis, translingual projection of that analysis to bootstrap analyses of other languages, and exploitation of that richer feature space for improved machine translation and bitext word alignment. Emphasis is on the combination of multiple sources of information, including both explicitly expressed human linguistic knowledge and patterns observed in monolingual and bilingual corpora, and on language pairs where advanced analysis capabilities are available for one language and unavailable for the other.

Selected contributions to science described in this dissertation include: