Richard Socher (MetaMind) “Multimodal Question Answering for Language and Vision” @ Hackerman Hall B17
Feb 16 @ 12:00 pm – 1:15 pm


Deep Learning enabled tremendous breakthroughs in visual understanding and speech recognition. Ostensibly, this is not the case in natural language processing (NLP) and higher level reasoning.
However, it only appears that way because there are so many different tasks in NLP and no single one of them, by itself, captures the complexity of language understanding. In this talk, I introduce dynamic memory networks which are our attempt to solve a large variety of NLP and vision problems through the lense of question answering.


Richard Socher is the CEO and founder of MetaMind, a startup that seeks to improve artificial intelligence and make it widely accessible. He obtained his PhD from Stanford working on deep learning with Chris Manning and Andrew Ng and won the best Stanford CS PhD thesis award. He is interested in developing new AI models that perform well across multiple different tasks in natural language processing and computer vision.

He was awarded the Distinguished Application Paper Award at the International Conference on Machine Learning (ICML) 2011, the 2011 Yahoo! Key Scientific Challenges Award, a Microsoft Research PhD Fellowship in 2012 and a 2013 “Magic Grant” from the Brown Institute for Media Innovation and the 2014 GigaOM Structure Award.

Mohit Bansal (UNC Chapel Hill) “Multi-Task and Reinforcement Learning for Entailment-Based Natural Language Generation” @ Hackerman Hall B17
May 5 @ 12:00 pm – 1:15 pm


In this talk, I will discuss my group’s recent work on using logically-implied textual entailment knowledge to improve a variety of downstream natural language generation tasks. First, we employ a multi-task learning setup to combine a directed premise-to-entailment generation task with the given downstream generation task such as multimodal video captioning (where the caption entails the video) and automatic document summarization (where the summary entails the document), achieving significant improvements over the state-of-the-art on multiple datasets and metrics. Next, we optimize for entailment classification scores as sentence-level metric rewards in a reinforcement learning style setup (via annealed policy gradient methods). Our novel reward function corrects the standard phrase-matching metric rewards to only allow for logically-implied partial matches and avoid contradictions, hence substantially improving the generation results.
Dr. Mohit Bansal is an assistant professor in the Computer Science department at University of North Carolina (UNC) Chapel Hill. Prior to this, he was a research assistant professor (3-year endowed position) at TTI-Chicago. He received his PhD from UC Berkeley in 2013 (where he was advised by Dan Klein) and his BTech from the IIT Kanpur in 2008. His research interests are in statistical natural language processing and machine learning, with a particular interest in multimodal, grounded, and embodied semantics (i.e., language with vision and speech, for robotics), human-like language generation and Q&A/dialogue, and interpretable and structured deep learning. He is a recipient of the 2016 and 2014 Google Faculty Research Awards, 2016 Bloomberg Data Science Award, 2014 IBM Faculty Award, and 2014 ACL Best Paper Award Honorable Mention. Webpage:

Center for Language and Speech Processing