We KnowItAll: lessons from a Quarter Century of Web Extraction Research – Oren Etzioni (University of Washington)
For the last quarter century (measured in person years), the KnowItAll project has investigated information extraction at Web scale. If successful, this effort will begin to address the long-standing “Knowledge Acquisition Bottleneck” in Artificial Intelligence, and will enable a new generation of search engines that extract and synthesize information from text to answer complex user queries. To date, we have generalized information extraction methods to process arbitrary Web text, to handle unanticipated concepts, and to leverage the redundancy inherent in the Web corpus, but many challenges remain. One of the most formidable challenges is moving from extracting isolated nuggets of information to capturing a coherent body of knowledge that can support automatic inference. My talk will describe the lessons we have learned and identify directions for future work.
Oren Etzioni is the Washington Research Foundation Entrepreneurship Professor at the University of Washington’s Computer Science Department.He received his bachelor’s degree in Computer Science from Harvard University in June 1986 where he was the first Harvard student to “major” in Computer Science. Etzioni received his Ph.D. from Carnegie Mellon University in January 1991, and joined the University of Washington’s faculty in February 1991, where he is now a Professor of Computer Science. Etzioni received a National Young Investigator Award in 1993, and was selected as a AAAI Fellow a decade later. He is the founder and director of the University of Washington’s Turing Center.Etzioni is also a Venture Partner at Madrona Venture Group where he chairs the Technology Advisory Board. He was the founder of Farecast, a company that utilizes data mining techniques to anticipate airfare fluctuations. Microsoft acquired Farecast in 2008. He was a co-founder of Clearforest, a text-mining startup, which was acquired by Reuters in 2007. He was the Chief Technology Officer and a board member of Go2net, which was acquired by Infospace in 2000. Finally, he co-founded Netbot, acquired by Excite in 1997. At Netbot, he helped to conceive of and design the web’s first major comparison-shopping agent. In 1995, Etzioni and his student Erik Selberg developed MetaCrawler, the web’s premier Meta-search engine for several years, now being run by Infospace. Finally, he has served on the board of Performant (acquired by Mercury Interactive in 2003) and been a consultant or advisor to Askjeeves, Excite, Infospace, Google, Microsoft, Northern Telecom, SAIC, Vivisimo, and Zillow, and others.