Not Just for Kids: Enriching Information Retrieval with Reading Level Metadata – Kevyn Collins-Thompson (Microsoft Research)

April 24, 2012 all-day

A document isn’t relevant – at least, not immediately – if you can’t understand it, yet search engines have traditionally ignored the problem of finding content at the right level of difficulty as an aspect of relevance. Moreover, little is currently known about the nature of the Web, its users, and how users interact with content when seen through the lens of reading difficulty. I’ll present our recent research progress in combining reading difficulty prediction with information retrieval, including models, algorithms and large-scale data analysis. Our results show how the availability of reading level metadata – especially in combination with topic metadata – opens up new and sometimes surprising possibilities for enriching search systems, from personalizing Web search results by reading level to predicting user and site expertise, improving result caption quality, and estimating searcher motivation.This talk includes joint work with Paul N. Bennett, Ryen White, Susan Dumais, Jin Young Kim, Sebastian de la Chica, and David Sontag.
Kevyn Collins-Thompson is a Researcher in the Context, Learning and User Experience for Search (CLUES) group at Microsoft Research (Redmond). His research lies in an area combining information retrieval, machine learning, and computational linguistics, and focuses on models, algorithms, and evaluation methods for making search technology more reliable and effective. His recent work has explored algorithms and Web search applications for reading level prediction; optimization strategies that reduce the risk of applying risky retrieval algorithms like personalization and automatic query rewriting; and educational applications of IR such as intelligent tutoring systems. Kevyn received his Ph.D. and M.Sc. from the Language Technologies Institute at Carnegie Mellon University and B.Math from the University of Waterloo.

Center for Language and Speech Processing