Zipf’s law is commonly glossed by the aphorism “infrequent words are frequent,” but in practice, it has often meant that there are three types of words: frequent, infrequent, and out-of-vocabulary (OOV). Speech recognition solved the problem of frequent words in 1970 (with dynamic time warping). Hidden Markov models worked well for moderately infrequent words, but the problem of OOV words was not solved until sequence-to-sequence neural nets de-reified the concept of a word. Many other social phenomena follow power-law distributions. The number of native speakers of the N’th most spoken language, for example, is 1.44 billion over N to the 1.09. In languages with sufficient data, we have shown that monolingual pre-training outperforms multilingual pre-training. In less-frequent languages, multilingual knowledge transfer can significantly reduce phone error rates. In languages with no training data, unsupervised ASR methods can be proven to converge, as long as the eigenvalues of the language model are sufficiently well separated to be measurable. Other systems of social categorization may follow similar power-law distributions. Disability, for example, can cause speech patterns that were never seen in the training database, but not all disabilities need do so. The inability of speech technology to work for people with even common disabilities is probably caused by a lack of data, and can probably be solved by finding better modes of interaction between technology researchers and the communities served by technology.
Mark Hasegawa-Johnson is a William L. Everitt Faculty Fellow of Electrical and Computer Engineering at the University of Illinois in Urbana-Champaign. He has published research in speech production and perception, source separation, voice conversion, and low-resource automatic speech recognition.
Advanced neural language models have grown ever larger and more complex, pushing forward the limits of language understanding and generation, while diminishing interpretability. The black-box nature of deep neural networks blocks humans from understanding them, as well as trusting and using them in real-world applications. This talk will introduce interpretation techniques that bridge the gap between humans and models for developing trustworthy natural language processing(NLP). I will first show how to explain black-box models and evaluate their explanations for understanding their prediction behavior. Then I will introduce how to improve the interpretability of neural language models by making their decision-making transparent and rationalized. Finally, I will discuss how to diagnose and improve models (e.g., robustness) through the lens of explanations. I will conclude with future research directions that are centered around model interpretability and committed to facilitating communications and interactions between intelligent machines, system developers, and end users for long-term trustworthy AI.
Hanjie Chen is a Ph.D. candidate in Computer Science at the University of Virginia, advised by Prof. Yangfeng Ji. Her research interests lie in Trustworthy AI, Natural Language Processing (NLP), andInterpretable Machine Learning. She develops interpretation techniques to explain neural language models and make their prediction behavior transparent and reliable. She is a recipient of the Carlos and Esther Farrar Fellowship and the Best Poster Award at the ACM CAPWIC 2021. Her work has been published at top-tier NLP/AI conferences (e.g., ACL, AAAI, EMNLP, NAACL) and selected by the National Center for Women & Information Technology (NCWIT) Collegiate Award Finalist 2021. She (as the primary instructor) co-designed and taught the course, Interpretable Machine Learning, and was awarded the UVA CS Outstanding Graduate Teaching Award and University-wide Graduate Teaching Awards Nominee (top 5% of graduate instructors). More details can be found at https://www.cs.virginia.edu/~hc9mx