Amir Hussein “Towards End-to-End Conversational Speech Translation”

Calendar

When:

February 5, 2024 @ 12:00 pm – 1:15 pm

2024-02-05T12:00:00-05:00

2024-02-05T13:15:00-05:00

Where:

Hackerman Hall B17
3400 N. Charles Street
Baltimore
MD 21218

Cost:

Free

Student Seminars

2024 February Hussein

Abstract

Over the past three decades, the fields of automatic speech recognition (ASR) and machine translation (MT) have witnessed remarkable advancements, leading to exciting research directions such as speech-to-text translation (ST). This talk will delve into the domain of conversational ST, an essential facet of daily communication, which presents unique challenges including spontaneous informal language, the presence of disfluencies, high context dependence and a scarcity of ST paired data.

Conversational speech is notably characterized by its reliance on short segments, requiring the integration of broader contexts to maintain consistency and improve the translation’s fluency and quality. Incorporating longer contexts has been shown to benefit machine translation, but the inclusion of context in E2E-ST remains under-studied. Previous approaches have used simple concatenation of audio inputs for context, leading to memory bottlenecks, especially in self-attention networks, due to the encoding of lengthy audio segments.

First, I will describe how to integrate the context into E2E-ST with minimum additional memory cost. Then, I will discuss the challenges of incorporating context in an E2E-ST system with limited data during training and inference and propose solutions to overcome them. Afterward, I will illustrate the impact of context size and the inclusion of speaker information on performance. Lastly, I will demonstrate the benefits of context in conversational settings focusing on aspects like anaphora resolution and the identification of named entities.

Amir Hussein “Towards End-to-End Conversational Speech Translation”

Center for Language and Speech Processing