Facing the Curse of Dimensionality in Statistical Language Modeling using Distributed Representations: Yoshua Bengio - 07/31/2002
slides from Yoshua Bengio's lecture (.pdf format)
- Location: Shaffer Hall, Room 100
- Time: 10:30 am - 12:00 noon
- Abstract:
From the point of view of statistical machine learning, a central challenge in the problem of statistical language modeling is the combinatorial explosion of combinations of words in sentences, which is related to the curse of dimensionality in non-parametric modeling of high-dimensional data. In terms of probabilistic modeling, this translates into the question of distributing probability mass from the observed training sentences in such a way as to assign as high a probability as possible to sentences not yet seen. We first try to identify the main ways to thus generalize that have been used in statistical language models. We then describe an approach based on a distributed representation of words (to avoid the curse of dimensionality), using an artificial neural network, which learns a smooth notion of similarity between words in order to smooth the empirical distribution. This approach is similar to maximum entropy models, but with the features also learned (by penalized maximum likelihood), and has yielded very good results, but it is very computationally expensive. To address that issue we introduce sampling methods similar to Hinton's contrastive divergence, in order to approximate the gradient of the distribution's partition function. Finally, we describe current research to take advantage of WordNet in order to begin incorporating the notions of polysemy, ontology, and grammar into this approach.
- Supplmental Materials:
|