Structural Alignment for Finite-State Syntactic Processing
Brian Roark, OGI School of Science and Engineering at OHSU
September 27, 2007
In this talk we will present some preliminary experiments on using multi-sequence alignment (MSA) techniques for inducing monolingual finite-state tagging models that capture some global sequence information. Such MSA techniques are popular in bio-sequence processing, where key information about long-distance dependencies and three-dimensional structures of protein or nucleotide sequences can be captured without resorting to polynomial complexity context-free models. In the NLP community, such techniques have been used very little -- most notably for aligning paraphrases (Barzilay and Lee, 2003) -- and not at all for monolingual syntactic processing. We discuss key issues in pursuing this approach: syntactic functional alignment; inducing multi-sequence alignments; and using such alignments in tagging. Experiments are preliminary but promising.
Brian Roark is a faculty member in the Center for Spoken Language Understanding (CSLU) and Department of Computer Science and Electrical Engineering (CSEE) of the OGI School of Science and Engineering at OHSU. He was in the Speech Algorithms Department at AT&T Labs from 2001-2004. He finished his Ph.D. in the Department of Cognitive and Linguistic Sciences at Brown University in 2001. At Brown he was part of the Brown Laboratory for Linguistic Information Processing.