The Unreasonable Effectiveness of Deep Learning – Yann LeCun (Facebook)
The emergence of large datasets, parallel computers, and new machine learning methods, have enabled the deployment of highly-accurate computer perception systems and are opening the door to a wide deployment of AI systems.
A key component in AI systems is a module, sometimes called a feature extractor, that turns raw inputs into suitable internal representations. But designing and building such a module requires a considerable amount of engineering efforts and domain expertise.
Deep Learning methods have provided a way to automatically learn good representations of data from labeled or unlabeled samples. Deep architectures are composed of successive stages in which data representations are increasingly global, abstract, and invariant to irrelevant transformations of the input. Deep learning enables end-to-end training of these architectures, from raw inputs to ultimate outputs.
The convolutional network model (ConvNet) is a particular type of deep architecture somewhat inspired by biology, which consists of multiple stages of filter banks, interspersed with non-linear operators, and spatial pooling. ConvNets have become the record holder for a wide variety of benchmarks, including object detection, localization and recognition in image, semantic segmentation and labeling, face recognition, acoustic modeling for speech recognition, drug design, handwriting recognition, biological image segmentation, etc.
The most recent systems deployed by Facebook, Google, NEC, IBM, Microsoft, Baidu, Yahoo and others for image understanding, speech recognition, and natural language processing use deep learning. Many of these systems use very large and very deep ConvNets with billions of connections, trained in supervised mode. But many new applications require the use of unsupervised feature learning. A number of such methods based on sparse auto-encoder will be presented.
Several applications will be shown through videos and live demos, including a category-level object recognition system that can be trained on the fly, a scene parsing system that can label every pixel in an image with the category of the object it belongs to (scene parsing), an object localization and detection system, and several natural language processing systems. Specialized hardware architectures that run these systems in real time will also be described.
Yann LeCun is Director of AI Research at Facebook, and Silver Professor of Data Science, Computer Science, Neural Science, and Electrical Engineering at New York University, affiliated with the NYU Center for Data Science, the Courant Institute of Mathematical Science, the Center for Neural Science, and the Electrical and Computer Engineering Department.
He received the Electrical Engineer Diploma from Ecole Superieure d’Ingenieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Universite Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ in 1988. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU as a professor in 2003, after a brief period as a Fellow of the NEC Research Institute in Princeton. From 2012 to 2014 he directed NYU’s initiative in data science and became the founding director of the NYU Center for Data Science. He was named Director of AI Research at Facebook in late 2013 and retains a part-time position on the NYU faculty.
His current interests include AI, machine learning, computer perception, mobile robotics, and computational neuroscience. He has published over 180 technical papers and book chapters on these topics as well as on neural networks, handwriting recognition, image processing and compression, and on dedicated circuits and architectures for computer perception. The character recognition technology he developed at Bell Labs is used by several banks around the world to read checks and was reading between 10 and 20% of all the checks in the US in the early 2000s. His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to access scanned documents on the Web. Since the mid 1980’s he has been working on deep learning methods, particularly the convolutional network model, which is the basis of many products and services deployed by companies such as Facebook, Google, Microsoft, Baidu, IBM, NEC, AT&T and others for image and video understanding, document recognition, human-computer interaction, and speech recognition.
LeCun has been on the editorial board of IJCV, IEEE PAMI, and IEEE Trans. Neural Networks, was program chair of CVPR’06, and is chair of ICLR 2013 and 2014. He is on the science advisory board of Institute for Pure and Applied Mathematics, and Neural Computation and Adaptive Perception Program of the Canadian Institute for Advanced Research. He has advised many large and small companies about machine learning technology, including several startups he co-founded. He is the lead faculty at NYU for the Moore-Sloan Data Science Environment, a $36M initiative in collaboration with UC Berkeley and University of Washington to develop data-driven methods in the sciences. He is the recipient of the 2014 IEEE Neural Network Pioneer Award.