Information Extraction: What has Worked, What hasn't, and What has Promise for the Future
Ralph Weischedel, BBN Technologies
November 7, 2000
During the past 10 years, one of the dominant application areas for natural language processing research has been automatic extraction of information from text or speech to automatically update databases of names, descriptions, relations, and/or events.
During those same 10 years, natural language processing technology has experienced a paradigm shift -- formerly dominated by handwritten rules but now influenced by learning approaches.
This talk will review the underlying challenges, state benchmark results on standard test sets, point towards new directions, and suggest a vision for how technologies may be combined into useful systems.
Our primary focus among the competing approaches surveyed will be a recent approach to automatic information extraction based on statistical algorithms that learn to extract information from text or speech. The goal is to replace the requirement for writing patterns manually by annotating examples of the information to be extracted. We have evaluated this approach on news data.
Ralph Weischedel is a Principal Scientist at BBN with twenty-five years experience in written language processing, artificial intelligence, and knowledge representation. He leads a group of 10 full-time equivalents engaged in research, development, and application of natural language processing technology, including information extraction from text, and probabilistic language understanding. He is a past president of the Association for Computational Linguistics. He joined BBN in 1984, leaving the University of Delaware as an Associate Professor. He received his Ph.D. in Computer & Information Sciences from the University of Pennsylvania in 1975.