A Finite-State Model of Multilingual Text Analysis for Text-to-Speech Synthesis – Richard Sproat (Speech Synthesis Research Department, Bell Laboratories, Lucent Technologies)

November 5, 1996 all-day

In this talk, I will present a model of text analysis for text-to-speech synthesis (TTS) based on (weighted) finite-state transducers — WFSTs, which serves as the text-analysis module of the multilingual Bell Labs TTS system. To date, the model has been applied to seven languages: Spanish, Italian, Romanian, German, Russian, Mandarin and Japanese.The model’s structure is simple. An input text is converted into a finite-state acceptor, which is then composed with a set of lexical-analysis WFSTs that map it to all possible lexical analyses. For instance, ordinary words are analyzed into all possible morphological analyses, abbreviations are expanded into their possible expansions, and digit sequences are expanded into all possible sequences of annotated number names. A further set of WFSTs encoding contextual models is then composed with the lexical analyses to eliminate, or assign high cost to, implausible paths. The best path of the resulting mapping is computed to yield a single best mapping between the surface string and a lexical analysis. Finally, this transducer is composed with a third set of transducers, which compute the lexical to phonemic mapping, to yield the phonemic transcription of the input text.The transducers are constructed using a toolkit that allows for descriptions of lexicons, morphological rules, numeral-expansion rules, and phonological rules; the toolkit incorporates, inter alia, a rewrite rule compilation algorithm, and tools for converting from decision trees and decision lists into (sets of) WFSTs.The first portion of this talk will consist of a (short) introduction to WFSTs and some of their properties. The second portion will describe the TTS text-analysis model outlined above. The third portion will describe in some detail the operation of some of the tools used to construct the WFSTs, in particular the rule compiler, the decision tree compiler and the decision list compiler.

Center for Language and Speech Processing