Leveraging Large Speech Language Models as Evaluators for Expressive Speech – Bismarck Odoom (JHU)

Calendar

When:

March 13, 2026 @ 12:00 pm – 1:15 pm

2026-03-13T12:00:00-04:00

2026-03-13T13:15:00-04:00

Where:

Hackerman Hall B17

Cost:

Free

Seminars Student Seminars

2026 March Odoom

Abstract

Expressive speech generation aims to produce speech that conveys not only linguistic content but also nuanced emotional and stylistic information. However, evaluating the expressiveness of the generated speech remains a challenging problem, often relying on expensive human listening tests. We propose using large speech language models (SLMs) trained on speech and text data, as automatic evaluators for various aspects of expressive speech, such as emotion, gender, emotional intensity, valence, dominance, arousal, accent, and speak rate. We leverage the speech perception and understanding capabilities of existing large SLMs and fine-tune them to produce natural language evaluation of expressive attributes in speech, providing a scalable alternative to traditional evaluation methods.

Bio

Bismarck Odoom is a fourth year CS PhD student at CLSP at Johns Hopkins University advised by Philipp Koehn. His primary research interest focuses on Speech Translation and Multimodal LLMs.

Also Available by Zoom: https://wse.zoom.us/j/96735183473

Leveraging Large Speech Language Models as Evaluators for Expressive Speech – Bismarck Odoom (JHU)

Center for Language and Speech Processing