Predictive clustering: smaller, faster language modeling – Joshua Goodman (Microsoft Corp.)

September 11, 2001 all-day

I’ll start with a brief overview of current research directions in several Microsoft Research groups, including the Speech Group (where I used to be), the Machine Learning Group (where I currently work) and the Natural Language Processing Group (with whom I kibitz.) Then, I will go on to describe my own recent research in clustering. Clusters have been one of the staples of language modeling research for almost as long as there has been language modeling research. I will give a novel clustering approach that allows us to create smaller models, and to train maximum entropy models faster. First, I examine how to use clusters for language model compression, with a surprising result. I achieve my best results by first making the models larger using clustering, and then pruning them. This can result in a factor of three or more reduction in model size at the same perplexity. I then go on to examine a novel way of using clustering to speed up maximum entropy training. Maximum entropy is considered by many people to be one of the more promising avenues of language model research, but it is prohibitively expensive to train large models. I show how to use clustering to speed up training time by up to a factor of 35 over standard techniques, while slightly improving perplexity. The same approach can be used to speed up some other learning algorithms that try to predict a very large number of outputs.

Joshua Goodman worked at Dragon Systems for two years, where he designed the speech recognition engine that was used until their recent demise, which he claims he had nothing to do with. He then went to graduate school at Harvard, where he studied statistical natural language processing, especially statistical parsing. Next, he joined the Microsoft Speech Technology Group, where he worked on language modeling, especially language model combination, and clustering. Recently, he switched to the Machine Learning and Applied Statistics Group, where he plans to do “something with language and probabilities.

Center for Language and Speech Processing