Kyle Gorman (City University of New York) ” Weighted Finite-State Transducers: The Later Years”

When:
April 1, 2022 @ 12:00 pm – 1:15 pm
2022-04-01T12:00:00-04:00
2022-04-01T13:15:00-04:00
Where:
Ames Hall 234
3400 N. Charles Street
Baltimore
MD 21218
Cost:
Free

Abstract

While the “deep learning tsunami” continues to define the state of the art in speech and language processing, finite-state transducer grammars developed by linguists and engineers are still widely used in industrial, highly-multilingual settings, particularly for symbolic, “front-end” speech applications. In this talk, I will first briefly review the current state of the OpenFst and OpenGrm finite-state transducer libraries. I then review two “late-breaking” algorithms found in these libraries. The first is a heuristic but highly-effective general-purpose optimization routine for weighted transducers. The second is an algorithm for computing the single shortest string of non-deterministic weighted acceptors which lack certain properties required by classic shortest-path algorithms. I will then illustrate how the OpenGrm tools can be used to induce a finite-state string-to-string transduction model known as a pair n-gram model. This model has been applied to grapheme-to-phoneme conversion, loanword detection, abbreviation expansion, and back-transliteration, among other tasks.

Biography

Kyle Gorman is an assistant professor of linguistics at the Graduate Center, City University of New York, and director of the master’s program in computational linguistics; he is also a software engineer in the speech and language algorithms group at Google. With Richard Sproat, he is the coauthor of Finite-State Text Processing (Morgan & Claypool, 2021) and the creator of Pynini, a finite-state text processing library for Python. He has also published on statistical methods for comparing computational models, text normalization, grapheme-to-phoneme conversion, and morphological analysis, as well as many topics in linguistic theory.

Center for Language and Speech Processing