Learning to Read the Web – Tom Mitchell (Carnegie Mellon University)

February 17, 2012 all-day

View Seminar Video
We describe our efforts to build a Never-Ending Language Learner (NELL) that runs 24 hours per day, forever, learning to read the web. Each day NELL extracts (reads) more facts from the web, and integrates these into its growing knowledge base of beliefs. Each day NELL also learns to read better than yesterday, enabling it to go back to the text it read yesterday, and extract more facts, more accurately.NELL has now been running 24 hours/day for over two years. The result so far is a collection of 15 million interconnected beliefs (e.g., servedWtih(coffee, applePie), isA(applePie, bakedGood) ), that NELL is considering at different levels of confidence, along with hundreds of thousands of learned phrasings, morphoogical features, and web page structures that NELL uses to extract beliefs from the web.The approach implemented by NELL is based on three key ideas:Coupling the semi-supervised training of thousands of different functions that extract different types of information from different web sourcesAutomatically discovering new constraints that more tightly couple the training of these functions over timeA curriculum or sequence of increasing difficult learning tasksTrack NELL’s progress at http://rtw.ml.cmu.edu.

Center for Language and Speech Processing