Advances in Human Assessment: (i) Tracking Nonverbal Behavior [Carlos Busso]1 (ii) Speaker Variability for Speaker ID [John H.L. Hansen]2 – Carlos Busso and John Hansen (University of Texas at Dallas)
View Seminar Video
In this presentation, two perspectives of assessing human interaction are considered: (i) nonverbal behavior, and (ii) variability in speech production for speaker recognition.Part 1: During inter-personal human interaction, speech and gestures are intricately coordinated to express and emphasize ideas, as well as provide suitable feedback to the listener. The tone and intensity of speech, spoken language patterns, facial expressions, head motion and hand movements are all weaved together in a nontrivial manner in order to convey intent and desires for natural human communication. A joint analysis of these modalities is necessary to fully decode human communication. Among other things, this is critically needed in designing next generation information technology that attempts to mimic and emulate how humans process and produce communication signals. This talk will summarize our ongoing research in recognizing paralinguistic information conveyed through multiple communication channels during human interaction, with emphasis on social emotional behaviors.Part 2: In addition to differences in multi-modal exchange between human speakers, within-speaker differences also play a major role in altering the performance of automatic speech and speaker recognition systems. In this portion of the talk, we will consider speech production variability including (i) vocal effort (e.g., whisper, soft, neutral, loud, shout), (ii) Lombard Effect (speech produced in noise), and (iii) speech style (read, spontaneous, singing) and his these impact speaker recognition systems, along with potential methods to improve system performance.These studies are intended to develop a more fundamental understanding of how humans interact, and how communication models might contribute to more effective biometrics in identifying and tracking humans.
Carlos Busso is an Assistant Professor of Electrical Engineering at The University of Texas at Dallas (UTD). He received his B.S. (2000) and M.S (2003) degrees from University of Chile, Santiago, Chile, and his Ph.D-EE (2008) from University of Southern California (USC), Los Angeles, USA. Before joining UTD, he was a Postdoctoral Research Associate at Signal Analysis and Interpretation Laboratory (SAIL), USC. At USC, he received a Provost Doctoral Fellowship from 2003 to 2005 and a Fellowship in Digital Scholarship from 2007 to 2008. His research interests are in digital signal processing, speech and video processing, and multimodal interfaces He has worked on audio-visual emotion recognition, analysis of emotional modulation in gestures and speech, designing realistic human-like virtual characters, speech source detection using microphone arrays, speaker localization and identification in an intelligent environment, and sensing human interaction in multi-person meetings.John H.L. Hansen, is Dept. Head and Professor of Electrical Engineering at Univ. of Texas at Dallas. He holds the UTD Endowed Chair in Telecommunications Engineering, and a joint appointment in UTD School of Behavioral & Brain Sciences (Speech & Hearing). He has published extensively in the fields of Speech Processing and Language Technology, and has supervised 51 PhD/MS thesis students. In 2005, he received Univ. of Colorado â€“ Boulder Teacher of the Year Award for commitment to education in communication sciences and electrical engineering. He is an IEEE Fellow, an ISCA Fellow, and serves as Chair-Elect of IEEE Signal Processing Society Speech-Language Technical Committee. He also served as Co-Organizer and Technical Chair for IEEE ICASSP-2010, and Organizer for Interspeech-2002.