Localizing Objects and Actions in Videos with the Help of Accompanying Text

Research Group of the 2010 Summer Workshop

Multimedia content is a growing focus of search and retrieval, personalization, categorization, and information extraction. Video analysis allows us to find both objects and actions in video, but recognition of a large variety of categories is very challenging. Any text accompanying the video, however, can be very good at describing objects and actions at a semantic level, and often outlines the salient information present in the video. Such textual descriptions are often available as closed captions, transcripts or program notes. In this inter-disciplinary project, we will combine natural language processing, computer vision and machine learning to investigate how the semantic information contained in textual sources can be leveraged to improve the detection of objects and complex actions in video. We will parse the text to obtain verb-object dependencies, use lexical knowledge-bases to identify words that describe these objects and actions, use web-wide image databases to get exemplars of the objects and actions, and build models that can detect where in the video the objects and actions are localized.

Abstract
Final Report
Final Presentation
Final Presentation Video

Team Members
Senior Members
Cornelia Fermueller	University of Maryland
Jana Kosecka	George Mason
Jan Neumann	StreamSage/Comcast
Evelyne Tzoukermann	StreamSage
Graduate Students
Rizwan Chaudhry	Johns Hopkins University
Yi Li	University of Maryland
Ben Sapp	University of Pennsylvania
Gautam Singh	George Mason
Ching Lik Teo	University of Maryland
Xiaodong Yu	University of Maryland
Undergraduate Students
Francis Ferraro	University of Rochester
He He	Hong Kong Polytechnic University
Ian Perera	University of Pennsylvania
Affiliate Members
Yiannis Aloimonos	University of Maryland
Greg Hager	Johns Hopkins University
Rene Vidal	Johns Hopkins University

Localizing Objects and Actions in Videos with the Help of Accompanying Text

Upcoming Seminars

Center for Language and Speech Processing