Richard Sproat (Google Research) “Neural Models of Text Normalization for Speech Applications”

Calendar

When:

March 16, 2018 @ 12:00 pm – 1:15 pm

2018-03-16T12:00:00-04:00

2018-03-16T13:15:00-04:00

Where:

Hackerman Hall B17
3400 N Charles St
Baltimore, MD 21218
USA

Cost:

Free

Seminars

2018 March Sproat

Abstract

Speech applications such as text-to-speech (TTS) or automatic speech recognition (ASR), must not only know how to read ordinary words, but must also know how to read numbers, abbreviations, measure expressions, times, dates, and a whole range of other constructions that one frequently finds in written texts. The problem of dealing with such material is called text normalization. The traditional approach to this problem, and the one currently used in Google’s deployed TTS and ASR systems, involves large hand-constructed grammars, which are costly to develop and tricky to maintain. It would be nice if one could simply train a system from text paired with its verbalization.

I will present our work on applying neural sequence-to-sequence RNN models to the problem of text normalization. Given sufficient training data, such models can achieve very high accuracy, but also tend to produce the occasional error — reading “kB” as “hectare”, misreading a long number such as “3,281” — that would be problematic in a real application. The most powerful method we have found to correct such errors is to use finite-state over-generating covering grammars at decoding time to guide the RNN away from “silly” readings: Such covering grammars can be learned from a very small amount of annotated data. The resulting system is thus a hybrid system, rather than a purely neural one, a purely neural approach being apparently impossible at present.

(Joint work with Ke Wu, Hao Zhang, Kyle Gorman, Felix Stahlberg, Xiaochang Peng and Brian Roark).

Biography

Richard Sproat received his Ph.D. in Linguistics from the Massachusetts Institute of Technology in 1985. He has worked at AT&T Bell Labs, at Lucent’s Bell Labs and at AT&T Labs — Research, before joining the faculty of the University of Illinois. From there he moved to the Center for Spoken Language Understanding at the Oregon Health & Science University. In the Fall of 2012 he moved to Google, New York as a Research Scientist.

Sproat has worked in numerous areas relating to language and computational linguistics, including syntax, morphology, computational morphology, articulatory and acoustic phonetics, text processing, text-to-speech synthesis, and text-to-scene conversion. Some of his recent work includes multilingual named entity transliteration, the effects of script layout on readers’ phonological awareness, and tools for automated assessment of child language. At Google he works on multilingual text normalization, most recently using neural

methods. He also has a long-standing interest in writing systems and symbol

systems more generally.

Richard Sproat (Google Research) “Neural Models of Text Normalization for Speech Applications”

Center for Language and Speech Processing