Towards a Universal Framework for Tree Transduction – Stuart Shieber (Harvard)

November 30, 2004 all-day

The typical natural-language pipeline can be thought of as proceeding by successive transformation of various data structures, especially strings and trees. For instance, low-level speech processing can be viewed as transduction of strings of speech samples into phoneme strings, then into triphone strings, finally into word strings. Morphological processes can similarly be modeled as character string transductions. For this reason, weighted finite-state transducers (WFST), a general formalism for string-to-string transduction, can serve as a kind of universal formalism for representing low-level natural-language processes. Higher-level natural-language processes can also be thought of as transductions, but on more highly structured representations, in particular, trees. Semantic interpretation can be viewed as a transduction from a syntactic parse tree to a tree of semantic operations whose simplification to logical form can be viewed as a further transduction. Machine translation systems have been viewed as tree transductions of various sorts as well. This raises the question as to whether there is a universal formalism for natural-language tree transduction that can play the same role there that WFST plays for string transduction. In this talk, we explore this question, proposing that the characterization of classical tree transducers in terms of bimorphisms, little known outside the formal language theory community, can be used as a unifying framework for a wide variety of tree transduction formalisms, including, for instance, several previously proposed for statistical machine translation and the back-end formalism for Dragons speech command and control system. The framework also places so-called synchronous grammar formalisms into the tree transducer family for the first time.

Stuart Shieber is Harvard College Professor and James O. Welch, Jr. and Virginia B. Welch Professor of Computer Science in the Division of Engineering and Applied Sciences at Harvard University. Professor Shieber was awarded a Presidential Young Investigator award in 1991, and was named a Presidential Faculty Fellow in 1993, one of only thirty in the country in all areas of science and engineering. At Harvard, he has been awarded two honorary chairs: the John L. Loeb Associate Professorship in Natural Sciences in 1993 and the Harvard College Professorship in 2001. He was elected a Fellow of the American Association for Artificial Intelligence in 2004. He is the author or editor of five books and numerous articles in computer science. Professor Shieber holds eight patents, and is co-founder of Cartesian Products, Inc., a high-technology research and development company based in Cambridge, Massachusetts, providing advanced software technology to improve worldwide communication and information access. He is also the founder of Microtome Publishing, a company dedicated to publishing services in support of open access to the scholarly literature.

Center for Language and Speech Processing