New Parameterization for Emotional Speech Synthesis

This project’s goal is to improve the quality of speech synthesis for spoken dialog systems and speech-to-speech translation systems. Instead of just producing natural sounding high quality speech output from raw text, we will investigate how to make the output speech be stylistically appropriate for the text. For example speech output for the sentence “There is an emergency, please leave the building, now” requires a different style of delivery from a sentences like “Are you hurt?”. We will use both speech recorded from actors, and natural examples of emotional speech. Using new articulatory feature extraction techniques, and novel machine learning techniques we will build emotional speech synthesis voices and test them with both objective and subjective measures. This will also require developing new techniques for evaluating our results using crowdsourcing in an efficient way.

 

Team Members 
Senior Members
Alan BlackCarnegie Mellon University
Tim BunnellUniversity of Delaware
Florian MetzeCarnegie Mellon University
Kishore PrahalladIIIT, Hyderabad
Stefan SteidlICSI at Berkeley
Graduate Students
Prasanna KumarCarnegie Mellon University
Tim PolzehlTechnical University of Berlin
Undergraduate Students
Daniel PerryUniversity of California, Los Angeles
Caroline VaughnOberlin College
Affiliate Members
Eric Fosler-LussierOhio State University
Karen LivescuToyota Technical Institute at Chicago

Johns Hopkins University

Johns Hopkins University, Whiting School of Engineering

Center for Language and Speech Processing
Hackerman 226
3400 North Charles Street, Baltimore, MD 21218-2608

Center for Language and Speech Processing