How Does the Brain Solve Visual Object Recognition – James DiCarlo (McGovern Institute for Brain Research at MIT)

2012 Summer Workshop

View at Vimeo
Visual object recognition is a fundamental building block of memory and cognition, but remains a central unsolved problem in systems neuroscience, human psychophysics, and computer vision (engineering). The computational crux of visual object recognition is that the recognition system must somehow be robust to tremendous image variation produced by different views of each object — the so-called, “invariance problem”. The primate brain is an example of a powerful recognition system and my laboratory aims to understand and emulate its solution to this problem. A key step in isolating and constraining the brain’s solution is to first find the patterns of neuronal activity and ways to read that neuronal activity that quantitatively express the brain’s answer to visual recognition. To that end, we have previously shown that a part of the primate ventral visual stream (inferior temporal cortex, IT) rapidly and automatically conveys neuronal population rate codes that qualitatively solve the invariance problem for vision. While this is a good start, it only weakly constrains the brain’s solution. Thus, we have recently set the bar higher — are such codes quantitatively sufficient to explain behavioral performance? In this talk, I will show how primate systems neuroscience combined with human psychophysics reveals that some (but not all) IT population codes are sufficient to explain human performance on invariant object recognition. This stands in stark contrast to all tested codes in earlier visual areas and computer vision codes, which are all insufficient (falsified by experimental data). These results argue that these rapidly and automatically computed IT population codes are common to primate brains, and that they are the direct substrate of object recognition performance. While this progress constrains and frames the kinds of algorithms we should be searching for in the primate brain, it does not directly reveal their key principles of image encoding or the myriad key details of that encoding. While this remains an area of active research, I will conclude by outlining how we aim to combine our experimental results in unsupervised learning with novel computer vision technology to guide us toward discovery of the true underlying cortical algorithm.
DiCarlo joined the McGovern Institute in 2002, and is an associate professor in the Department of Brain and Cognitive Sciences. He received his Ph.D. and M.D. from Johns Hopkins University and did postdoctoral work at Baylor College of Medicine. In 1998, he received the Martin and Carol Macht Young Investigator Research Prize from Johns Hopkins University. In 2002, he received an Alfred P. Sloan Research fellowship and a Pew Scholar Award. He received MIT’s Surdna Research Foundation Award and its School of Science Prize for Excellence in Undergraduate Teaching in 2005, and he won a Neuroscience Scholar Award from the McKnight Foundation in 2006.

Center for Language and Speech Processing