BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-23586@www.clsp.jhu.edu DTSTAMP:20240328T174618Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20230410T120000 DTEND;TZID=America/New_York:20230410T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Student Seminar – Ruizhe Huang URL:https://www.clsp.jhu.edu/events/student-seminar-ruizhe-huang/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,April\,Huang END:VEVENT BEGIN:VEVENT UID:ai1ec-24479@www.clsp.jhu.edu DTSTAMP:20240328T174618Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:
Abstract
\nT he speech field is evolving to solve more challenging scenarios\, such as multi-channel recordings with multiple simultaneous talkers. Given the man y types of microphone setups out there\, we present the UniX-Encoder. It’s a universal encoder designed for multiple tasks\, and worked with any mic rophone array\, in both solo and multi-talker environments. Our research e nhances previous multichannel speech processing efforts in four key areas: 1) Adaptability: Contrasting traditional models constrained to certain mi crophone array configurations\, our encoder is universally compatible. 2) MultiTask Capability: Beyond the single-task focus of previous systems\, U niX-Encoder acts as a robust upstream model\, adeptly extracting features for diverse tasks including ASR and speaker recognition. 3) Self-Supervise d Training: The encoder is trained without requiring labeled multi-channel data. 4) End-to-End Integration: In contrast to models that first beamfor m then process single-channels\, our encoder offers an end-to-end solution \, bypassing explicit beamforming or separation. To validate its effective ness\, we tested the UniXEncoder on a synthetic multi-channel dataset from the LibriSpeech corpus. Across tasks like speech recognition and speaker diarization\, our encoder consistently outperformed combinations like the WavLM model with the BeamformIt frontend.
DTSTART;TZID=America/New_York:20240311T200500 DTEND;TZID=America/New_York:20240311T210500 SEQUENCE:0 SUMMARY:Zili Huang (JHU) “Unix-Encoder: A Universal X-Channel Speech Encode r for Ad-Hoc Microphone Array Speech Processing” URL:https://www.clsp.jhu.edu/events/zili-huang-jhu-unix-encoder-a-universal -x-channel-speech-encoder-for-ad-hoc-microphone-array-speech-processing/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2024\,Huang\,March END:VEVENT END:VCALENDAR