Automated Grammatical Error Correction for Language Learners – Joel Tetreault (Yahoo! Labs)

December 2, 2014 all-day

A fast growing area in Natural Language Processing is the use of automated tools for identifying and correcting grammatical errors made by language learners. This growth, in part, has been fueled by the needs of a large number of people in the world who are learning and using a second or foreign language. For example, it is estimated that there are currently over one billion people who are non-native speakers of English. These numbers drive the demand for accurate tools that can help learners to write and speak proficiently in another language. Such demand also makes this an exciting time for those in the NLP community who are developing automated methods for grammatical error correction (GEC). In the last five years alone, the field has grown tremendously from a few conference and workshop papers to four shared tasks (two of which were co-located with CoNLL), papers at conferences such as ACL and EMNLP, and two Morgan Claypool Synthesis Series books. While there have been many exciting developments in GEC over the last few years, there is still considerable room for improvement as state-of-the-art performance in detecting and correcting several important error types is still inadequate for many real world applications. In this talk, I will provide an overview of the field of automated grammatical error correction, including its history, leading methodologies and its particular set of challenges. Although applications of GEC are often geared toward the classroom, its methods are more generally applicable to a wide variety of NLP problems, especially where systems must contend with noisy data, such as MT evaluation and correction, analysis of microblogs and other user-generated content, and disfluency detection in speech.

Joel Tetreault is a Senior Research Scientist at Yahoo Labs in New York City. His research focus is Natural Language Processing with specific interests in anaphora, dialogue and discourse processing, machine learning, and applying these techniques to the analysis of English language learning and automated essay scoring. Previously he was Principal Manager of the Core Natural Language group at Nuance Communications, Inc. where he worked on the research and development of NLP tools and components for the next generation of intelligent dialogue systems. Prior to Nuance, he worked at Educational Testing Service for six years as a Managing Senior Research Scientist where he researched automated methods for detecting grammatical errors by non-native speakers, plagiarism detection, and content scoring. Tetreault received his B.A. in Computer Science from Harvard University (1998) and his M.S. and Ph.D. in Computer Science from the University of Rochester (2004). He was also a postdoctoral research scientist at the University of Pittsburgh’s Learning Research and Development Center (2004-2007), where he worked on developing spoken dialogue tutoring systems. In addition he has co-organized the Building Educational Application workshop series for 7 years, the CoNLL 2013 Shared Task on Grammatical Error Correction, and is currently NAACL Treasurer.

Center for Language and Speech Processing