Large Scale Supervised Embedding for Text and Images

Jason Weston, Google

July 27, 2011

Presentation Slides

View Seminar Video


In this talk I will present two related pieces of research for text retrieval and image annotation that both use supervised embedding algorithms over large datasets. Part 1:The first part of the talk presents a class of models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score. Like latent semantic indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However unlike LSI, our models are trained with a supervised signal directly on the task of interest, which we argue is the reason for our superior results. We provide an empirical study on Wikipedia documents, using the links to define document-document or query-document pairs, where we beat several baselines. We also describe extensions to the nonlinear case and for dealing with huge dictionary sizes. (Joint work with Bing Bai, David Grangier and Ronan Collobert.) Part 2:Image annotation datasets are becoming larger and larger, with tens of millions of images and tens of thousands of possible annotations. We propose a well performing method that scales to such datasets by simultaneously learning to optimize precision at k of the ranked list of annotations for a given image and learning a low-dimensional joint embedding space for both images and annotations. Our method both outperforms several baseline methods and, in comparison to them, is faster and consumes less memory. We also demonstrate how our method learns an interpretable model, where annotations with alternate spellings or even languages are close in the embedding space. Hence, even when our model does not predict the exact annotation given by a human labeler, it often predicts similar annotations, a fact that we try to quantify by measuring the ``sibling'' precision metric, where our method also obtains good results. (Joint work with Samy Bengio and Nicolas Usunier.)


Jason Weston is a Research Scientist at Google NY since July 2009. He earned his PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisor: Vladimir Vapnik) in 2000. From 2000 to 2002, he was a Researcher at Biowulf technologies, New York. From 2002 to 2003 he was a Research Scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to June 2009 he was a Research Staff Member at NEC Labs America, Princeton. His interests lie in statistical machine learning and its application to text, audio and images. Jason has published over 80 papers, including best paper awards at ICML and ECML.