Fast Computation of Maximum Entropy / Minimum Divergence Feature Gain
Harry Printz, IBM T.J. Watson Research Center
October 20, 1998
Maximum entropy / minimum divergence modeling is a powerful technique for constructing probability models, which has been applied to a wide variety of problems in natural language processing. A maximum entropy / minimum divergence (MEMD) model is built from a base model, and a set of feature functions, whose empirical expectations on some training corpus are known.
A fundamental difficulty with this technique is that while there are typically millions of feature functions that could be incorporated into a given model, in general it is not computationally feasible, or even desirable, to use them all. Thus some means must be devised for determining each feature's predictive power, also known as its gain. Once the gains are known, the features can be ranked according to their utility, and only the most gainful ones retained. This talk presents a new algorithm for computing feature gain that is fast, accurate and memory-efficient.
Harry Printz is a Research Staff Member at IBM's Watson Research Center in Yorktown Heights, NY, where he leads the language modeling team. He has previously worked on reconfigurable hardware at the Digital Equipment Corporation Paris Research Laboratory, and on medical computing at Bolt, Beranek and Newman. He holds a PhD in Computer Science from Carnegie Mellon, a BA in Mathematics and Philosophy from Oxford University, where he was a Rhodes Scholar, and a BA and an MA in Physics from Harvard University. His current research interests are in mathematical models of language and speech.