Machine Translation = Automata Theory + Probability + Linguistics
Kevin Knight, USC/Information Sciences Institute
May 8, 2007
Machine translation (MT) systems have been getting more accurate. One reason is that machines now gather translation knowledge autonomously, combing through large amounts of human-translated material available on the web. Most of these MT systems learn finite-state Markov models -- target strings are substituted for source strings, followed by local word re-ordering. This kind of model can only support very weak linguistic transformations, and the trained models do not yet lead to reliably high-quality MT. Over the past several years, many new probabilistic tree-based models (versus string-based models) have been designed and tested on many natural language applications, including MT. Such models frequently turn out to be instances of tree transducers, a formal automata model first described by W. Rounds and J. Thatcher in the 1960s and 70s. Tree automata open up new opportunities for us to marry deeper representations, mathematical theory, and machine learning. This talk covers novel algorithms and open problems for tree automata, together with experiments in machine translation.
Kevin Knight is a Senior Research Scientist and Fellow at USC's Information Sciences Institute, a Research Associate Professor in the Computer Science Department at USC, and co-founder of Language Weaver, Inc. He received his Ph.D. from Carnegie Mellon University in 1991 and his BA from Harvard University in 1986. He is co-author (with Elaine Rich) of the textbook Artificial Intelligence (McGraw-Hill, 1991). His research interests are in statistical natural language processing, machine translation, natural language generation, and decipherment.