Ciprian Chelba (Google Research) “Sparse Non-negative Matrix Language Modeling”
3400 N Charles St
Baltimore, MD 21218
We present Sparse Non-negative Matrix (SNM), a novel probability estimation technique for language modeling that can efficiently incorporate arbitrary features in a similar way to the more established family of maximum entropy (exponential models). Due to its parsimonious parameterization the model can be estimated efficiently on small amounts of data.
Experiments on various corpora show that the model matches established techniques in both perplexity and speech recognition accuracy.
The computational advantages of SNM estimation over both maximum entropy and neural network estimation are probably its main strength, promising an approach that has large flexibility in combining arbitrary features and yet scales gracefully to large amounts of data.
Ciprian Chelba is a Research Scientist with Google. Between 2000 and 2006 he worked as a Researcher in the Speech Technology Group at Microsoft Research.
He received his Diploma Engineer degree in 1993 from the Faculty of Electronics and Telecommunications at “Politechnica” University, Bucuresti, Romania, M.S. in 1996 and Ph.D. in 2000 from the Electrical and Computer Engineering Department at the Johns Hopkins University
His research interests are in statistical modeling of natural language and speech, as well as related areas such as machine learning and information theory as applied to natural language problems.
Recent projects include language modeling for Google Voice Search and the Android soft keyboard.