Sameer Singh (University of Washington) – “Interactive Training of Relation Embeddings Using High-Level Supervision

January 26, 2016 @ 12:00 pm – 1:15 pm
Hackerman Hall B17
3400 N Charles St
Baltimore, MD 21218


An important challenge in extracting useful structured information from text collections is relation extraction, i.e. identifying the types of relations between entities that are expressed in text. Due to the variety in how relations are rendered in language, labeling data for relation extraction is unfortunately notoriously time-consuming and expensive. Recently proposed embedding-based extractors that utilize unlabeled data and use noisy KB alignments as “distant labels” partially address this concern. However not only are these models inaccurate for relations that do not have large KB, but further, cannot be improved upon without annotating data. Purely rule-based systems, on the other hand, provide an attractive alternative as they allow users to directly inject symbolic domain knowledge, however require a large number of formulae to achieve reasonable generalization.

In this talk, I introduce an interactive training paradigm that combines embedding-based models of relation extraction with symbolic domain knowledge. I first describe how symbolic domain knowledge, if provided by the user as first-order logic statements, can be injected into the embeddings to improve the predictions. In the second part of the talk, I present an approach to “explain” the embedding-based model predictions using a symbolic representation, which the user can annotate directly for more effective supervision. I present experiments that demonstrate the potential of symbolic knowledge as supervision in reducing annotation effort and in quickly training accurate relation extraction systems.

This work is a collaboration with Tim Rocktaschel, Sebastian Riedel, Luke Zettlemoyer, and Carlos Guestrin.


Sameer Singh is a Postdoctoral Research Associate at the University of Washington, working with Carlos Guestrin, Luke Zettlemoyer, and Dan Weld on large-scale and interactive machine learning applied to information extraction and natural language processing. He received his PhD from the University of Massachusetts, Amherst in 2014, where he worked with Andrew McCallum on scalable inference for large graphical models and probabilistic programming. He was recently selected as a DARPA Riser, won the grand prize in the Yelp dataset challenge in 2015, has been awarded the Yahoo! KSC fellowship and the UMass Graduate School fellowship, and was a finalist for the Facebook PhD fellowship. Sameer’s internships at Microsoft Research, Google Research, and Yahoo! Labs involved designing machine learning algorithms for massive datasets. He is one of the founding organizers of the popular NIPS Big Learning and ICML Inferning workshops, and has been organizing the Automated Knowledge-Base Construction (AKBC) workshops in 2013, 2014, and 2016.

Center for Language and Speech Processing