Learning Energy-Based Models of High-Dimensional Data
Geoffry Hinton, University of Toronto
October 29, 2002
Many researchers have tried to model perception using belief networks based on directed acyclic graphs. The belief network is viewed as a stochastic generative model of the sensory data and perception consists of inferring plausible hidden causes for the observed sensory input. I shall argue that this is approach is probably misguided because of the difficulty of inferring posterior distributions in densely connected belief networks.
An alternative approach is to use layers of hidden units whose activities are a deterministic function of the the sensory inputs. The activities of the hidden units provide additive contributions to a global energy, E, and the probability of each sensory datavector is defined to be proportional to exp(-E). The problem of perceptual inference vanishes in deterministic networks, so perception is very fast once the network has been learned.
The main difficulty of this approach is that maximum likelihood learning is very inefficient. Maximum likelihood adjusts the parameters to maximize the probability of the observed data given the model, but this requires the derivatives of an intractable normalization term. I shall show how this difficulty can be overcome by using a different objective function for learning. The parameters are adjusted to minimize the extent to which the data distribution is distorted when it is moved towards the distribution that the model believes in. This new objective function makes it possible to learn large energy-based models quickly.