Xuedong Huang (Microsoft) “Cognitive Toolkit and Language Processing” @ Hackerman Hall B17
May 26 @ 12:00 pm – 1:15 pm


Microsoft recently achieved a historical human parity milestone to recognize conversational speech on the switchboard task.  Microsoft Cognitive Toolkit (CNTK) is the secret weapon that enabled this historical breakthrough.  This talk will explain the story behind the scenes on how Microsoft did this with CNTK.


Dr. Xuedong Huang is a Microsoft Technical Fellow in Microsoft AI and Research. He leads Microsoft’s Speech and Language Team. As Microsoft’s Chief Speech Scientist, he pioneered to lead the team achieving a historical conversational speech recognition human parity milestone in 2016.

In 1993, Huang joined Microsoft to found the company’s speech technology group. As the general manager of Microsoft’s spoken language efforts for over a decade, he helped to bring speech recognition to the mass market by introducing SAPI to Windows in 1995 and Speech Server to the enterprise call center in 2004. Prior to his current role, he spent five years in Bing as chief architect working to improve search and ads.

Before Microsoft, he was on the faculty at Carnegie Mellon University and achieved the best performance of all categories in 1992’s DARPA speech recognition benchmarking.

He received Alan Newell research excellence leadership medal in 1992 and IEEE Best Paper Award in 1993. He is an IEEE & ACM fellow. He was named as the Asian American Engineer of the Year (2011), and one of Wired Magazine’s 25 Geniuses Who Are Creating the Future of Business (2016). He holds over 100 patents and published over 100 papers & 2 books.

Kate Knill (University of Cambridge) “Are All Languages Created Equal for Speech Recognition?” @ Hackerman Hall B17
Oct 31 @ 12:00 pm – 1:15 pm


When considering building speech recognition and keyword search (KWS) systems for a ‘new’ language, two key questions are “how much data is going to be needed?” and “what resources are available?”. This talk will look at how to predict the first and how to mitigate for the second if the answer is “limited”. A wide range of factors affect recognition and KWS  performance from one language to the next, such as phone set size, morphological richness and dialect/accent variation. The harder the language the more data that is generally required to achieve the same level. This talk will present analysis of performance across a range of factors and languages, within and across language families, from the IARPA Babel programme. A method to predict performance given a small amount of data from a language will be presented. When data resources are limited, performance can be boosted by exploiting data from other languages. This talk will also discuss the use of multilingual features and multilingual models for such limited resource case.


Kate Knill is a Senior Research Associate at Engineering Department, Cambridge University, UK, working on automatic spoken language teaching and assessment within the ALTA Institute. She previously worked on the rapid development of speech systems for new languages on the IARPA BABEL project. She holds a PhD in Digital Signal Processing from Imperial College, London University, UK. Kate has over 25 years experience in speech and language processing in industry and academia, including leading the development of over 20 languages as Languages Manager at Nuance Communications (2000-2002) and establishing and leading the Speech Technology Group, Toshiba Cambridge Research Lab, UK (2002-2012). She was a member of the IEEE SLTC 2009-2012, is an ISCA Board member (2013-2021) and is currently Secretary of ISCA.

Center for Language and Speech Processing