Universal Speech Content Factorization – Henry Li Xinyuan (JHU)

Calendar

When:

March 13, 2026 @ 12:00 pm – 1:15 pm

2026-03-13T12:00:00-04:00

2026-03-13T13:15:00-04:00

Where:

Hackerman Hall B17

Cost:

Free

Seminars Student Seminars

2026 Li Xinyuan March

Abstract

We propose Universal Speech Content Factorization (USCF), a simple and invertible linear method for extracting a low-rank speech representation in which speaker timbre is suppressed while phonetic content is preserved. USCF extends Speech Content Factorization (SCF), a closed-set voice conversion method, to an open-set setting by learning a universal speech-to-content mapping via least-squares optimization and deriving speaker-specific transformations from only a few seconds of target speech. We show through embedding analysis that USCF effectively removes speaker-dependent variation. As a zero-shot voice conversion system, USCF achieves competitive intelligibility, naturalness, and speaker similarity compared to methods that require substantially more target-speaker data or additional neural training. Finally, we demonstrate that USCF features can serve as an alternative acoustic representation for text-to-speech, offering a linear, training-efficient substitute for timbre-prompted SSL-based systems.

Also Available by Zoom: https://wse.zoom.us/j/96735183473

Universal Speech Content Factorization – Henry Li Xinyuan (JHU)

Center for Language and Speech Processing