Energy-Based Models and Deep Learning – Yann LeCun (Computational and Biological Learning Lab, Courant Institute of Mathematical Sciences, New York University)

July 16, 2008 all-day

A long-term goal of Machine Learning research is to solve highly complex “intelligent” tasks, such as visual perception, auditory perception, and language understanding. To reach that goal, the ML community must solve two problems: the Partition Function Problem, and the Deep Learning Problem. The normalization problem is related to the difficulty of training probabilistic models over large spaces while keeping them properly normalized. In recent years, the ML and Natural Language communities have devoted considerable efforts to circumventing this problem by developing “un-normalized” learning models for tasks in which the output is highly structured (e.g. English sentences). This class of models was in fact originally developed during the early 90’s in the speech and handwriting recognition communities, and resulted in highly successful commercial system for automatically reading bank checks and other documents. The Deep Learning Problem is related to the issue of training all the levels of a recognition system (e.g. segmentation, feature extraction, recognition, etc) in an integrated fashion. We first consider “traditional” methods for deep supervised learning, such as multi-layer neural networks and convolutional networks, a learning architecture for image recognition loosely modeled after the architecture of the visual cortex. Several practical applications of convolutional nets will be demonstrated with videos and live demos, including a handwriting recognition system, a real-time human face detector that also estimates the pose of the face, a real-time system that can detect and recognize objects such as airplanes, cars, animals and people in images, and a vision-based navigation system for off-road mobile robots that trains itself on-line to avoid obstacles. Although these methods produce excellent performance, they require many training samples. The next challenge is to devise unsupervised learning methods for deep networks. Inspired by some recent work by Hinton on “deep belief networks”, we devised energy-based unsupervised algorithms that can learn deep hierarchies of invariant features for image recognition. We how such algorithms can dramatically reduces the required number of training samples, particularly for such tasks as the recognition of everyday objects at the category level.
Yann LeCun received an Electrical Engineer Diploma from École Supérieure d’Ingénieurs en Électronique et Électrotechnique (ESIEE), Paris in 1983, and a PhD in Compuer Science from Université Pierre-et-Marie-Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined the Adaptive Systems Research Department at AT&T Bell Laboratories in Holmdel, NJ, in 1988. Following AT&T’s split with Lucent Technologies in 1996, he joined at AT&T Labs-Research as head of the Image Processing Research Department. In 2002 he became a Fellow at the NEC Research Institute in Princeton. He has been a professor of computer science at NYU’s Courant Institute of Mathematical Sciences since 2003. Yann’s research interests include computational and biological models of learning and perception, computer vision, mobile robotics, data compression, digital libraries, and the physical basis of computation. He has published over 130 papers in these areas. His image compression technology, called DjVu, is used by numerous digital libraries and publishers to distribute scanned documents on-line, and his handwriting recognition technology is used to process a large percentage of bank checks in the US. He has been general chair of the annual “Learning at Snowbird” workshop since 1997, and program chair of CVPR 2006.

Center for Language and Speech Processing