Niko Brummer (AGNITIO) – “Binary and Multiclass Calibration in Speaker and Language Recognition”
3400 N Charles St
Baltimore, MD 21218
This is an updated version of a talk given at ASRU 2013 in Olomouc.
Automatic pattern classifiers that output soft, probabilistic classifications—rather than hard decisions—can be more widely and more profitably applied, provided the probabilistic output is
well-calibrated. In the fields of automatic speaker recognition and automatic spoken language recognition, the regular NIST technology evaluations have placed a strong emphasis on cost effective
application and therefore on calibration. This talk will describe calibration solutions for these technologies, with emphasis on criteria for measuring the goodness of calibration—if we can measure
it, we can also optimize it.
The core of the talk is a derivation and a re-interpretation of cross-entropy, which is the standard objective function in machine learning for the supervised training of classifiers. The main
theoretical result is that cross-entropy represents the expected cost of making minimum-expected-cost Bayes decisions, based on the outputs of a softmax classifier. For this equivalence we use a special misclassification cost function, defined over a smooth range of cost values. In practice this means that classifiers trained with cross-entropy can be expected to work well over a wide range of different applications.
Niko Brummer received B.Eng (1986), M.Eng (1988) and Ph.D. (2010) degrees, all in electronic engineering, from Stellenbosch University. He worked as researcher at DataFusion (later called Spescom DataVoice) and is currently chief scientist at AGNITIO. Most of his research for the last two decades has been applied to automatic speaker and language recognition and he has been participating in most of the NIST SRE and LRE evaluations in these technologies, from the year 2000 to the present. He has been contributing to the Odyssey Workshop series since 2001 and was organizer of Odyssey 2008 in Stellenbosch. His FoCal Toolkit is widely used for fusion and calibration in speaker and language recognition research.
His research interests include development of new algorithms for speaker and language recognition, as well as evaluation methodologies for these technologies. In both cases, his emphasis is on
probabilistic modelling. He has worked with both generative (eigenchannel, JFA, i-vector PLDA) and discriminative (system fusion, discriminative JFA and PLDA) recognizers. In evaluation, his focus is on judging the goodness of classifiers that produce probabilistic outputs in the form of well calibrated class likelihoods.