BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-21031@www.clsp.jhu.edu DTSTAMP:20240329T113417Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nMost people take for granted that when they speak\, t hey will be heard and understood. But for the millions who live with speec h impairments caused by physical or neurological conditions\, trying to co mmunicate with others can be difficult and lead to frustration. While ther e have been a great number of recent advances in Automatic Speech Recognit ion (ASR) technologies\, these interfaces can be inaccessible for those wi th speech impairments.\nIn this talk\, we will present Parrotron\, an end- to-end-trained speech-to-speech conversion model that maps an input spectr ogram directly to another spectrogram\, without utilizing any intermediate discrete representation. The system is also trained to emit words in addi tion to a spectrogram\, in parallel. We demonstrate that this model can be trained to normalize speech from any speaker regardless of accent\, pro sody\, and background noise\, into the voice of a single canonical target speaker with a fixed accent and consistent articulation and prosody. We fu rther show that this normalization model can be adapted to normalize highl y atypical speech from speakers with a variety of speech impairments (due to\, ALS\, Cerebral-Palsy\, Deafness\, Stroke\, Brain Injury\, etc.) \, r esulting in significant improvements in intelligibility and naturalness\, measured via a speech recognizer and listening tests. Finally\, demonstrat ing the utility of this model on other speech tasks\, we show that the sam e model architecture can be trained to perform a speech separation task.\n Dimitri will give a brief description of some key moments in development o f speech recognition algorithms that he was involved in and their applicat ions to YouTube closed captions\, Live Transcribe and wearable subtitles. \nFadi will then speak about the development of Parrotron.\nBiographies\nD imitri Kanevsky started his career at Google working on speech recognition algorithms. Prior to joining Google\, Dimitri was a Research staff member in the Speech Algorithms Department at IBM. Prior to IBM\, he worked at a number of centers for higher mathematics\, including Max Planck Institu te in Germany and the Institute for Advanced Studies in Princeton. He curr ently holds 295 US patents and was Master Inventor at IBM. MIT Technology Review recognized Dimitri conversational biometrics based security patent as one of five most influential patents for 2003. In 2012 Dimitri was hono red at the White House as a Champion of Change for his efforts to advance access to science\, technology\, engineering\, and math.\nFadi Biadsy is a senior staff research scientist at Google NY for the past ten years. He h as been exploring and leading multiple projects at Google\, including spee ch recognition\, speech conversion\, language modeling\, and semantic unde rstanding. He received his PhD from Columbia University in 2011. At Colum bia\, he researched a variety of speech and language processing projects i ncluding\, dialect and accent recognition\, speech recognition\, charismat ic speech and question answering. He holds a BSc and MSc in mathematics a nd computer science. He worked on handwriting recognition during his maste rs degree and he worked as a senior software developer for five years at D alet digital media systems building multimedia broadcasting systems. DTSTART;TZID=America/New_York:20211105T120000 DTEND;TZID=America/New_York:20211105T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Fadi Biadsy and Dimitri Kanevsky (Google) “Speech Recognition: From Speaker Dependent to Speaker Independent to Full Personalization” “Parrot ron: A Unified E2E Speech-to Speech Conversion and ASR Model for Atypical Speech” URL:https://www.clsp.jhu.edu/events/fadi-biadsy-and-dimitri-kanevsky-google / X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nMost people take for granted that when they speak\, they will be heard and understood. But for the millions who live with speech impairments caused by physical or neurological condi tions\, trying to communicate with others can be difficult and lead to fru stration. While there have been a great number of recent advances in Autom atic Speech Recognition (ASR) technologies\, these interfaces can be inacc essible for those with speech impairments.
\nIn this talk\, we will present Parrotron\, an end-to-end-trained speech-to-sp eech conversion model that maps an input spectrogram directly to another s pectrogram\, without utilizing any intermediate discrete representation. T he system is also trained to emit words in addition to a spectrogram\, in parallel. We demonstrate that this model can be trained to normalize spe ech from any speaker regardless of accent\, prosody\, and background noise \, into the voice of a single canonical target speaker with a fixed accent and consistent articulation and prosody. We further show that this normal ization model can be adapted to normalize highly atypical speech from spea kers with a variety of speech impairments (due to\, ALS\, Cerebral-Palsy\, Deafness\, Stroke\, Brain Injury\, etc.) \, resulting in significant imp rovements in intelligibility and naturalness\, measured via a speech recog nizer and listening tests. Finally\, demonstrating the utility of this mod el on other speech tasks\, we show that the same model architecture can be trained to perform a speech separation task.
\nDimitri will give a brief description of some key moments in development o f speech recognition algorithms that he was involved in and their applicat ions to YouTube closed captions\, Live Transcribe and wearable subtitles.
\nFadi will then speak about the development of Parrotron.
\nBiographies
\nDimitri K anevsky started his career at Google working on speech recognitio n algorithms. Prior to joining Google\, Dimitri was a Research staff membe r in the Speech Algorithms Department at IBM. Prior to IBM\, he worked a t a number of centers for higher mathematics\, including Max Planck Instit ute in Germany and the Institute for Advanced Studies in Princeton. He cur rently holds 295 US patents and was Master Inventor at IBM. MIT Technology Review recognized Dimitri conversational biometrics based security patent as one of five most influential patents for 2003. In 2012 Dimitri was hon ored at the White House as a Champion of Change for his efforts to advance access to science\, technology\, engineering\, and math.
\nFadi Biadsy is a senior staff research scientist at Google NY for the past ten years. He has been exploring and leading multiple projects a t Google\, including speech recognition\, speech conversion\, language mod eling\, and semantic understanding. He received his PhD from Columbia Uni versity in 2011. At Columbia\, he researched a variety of speech and langu age processing projects including\, dialect and accent recognition\, speec h recognition\, charismatic speech and question answering. He holds a BSc and MSc in mathematics and computer science. He worked on handwriting rec ognition during his masters degree and he worked as a senior software deve loper for five years at Dalet digital media systems building multimedia br oadcasting systems.
\n X-TAGS;LANGUAGE=en-US:2021\,Biadsy and Kanevsky\,November END:VEVENT BEGIN:VEVENT UID:ai1ec-21494@www.clsp.jhu.edu DTSTAMP:20240329T113417Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract\nAdversarial attacks deceive neural network systems by adding carefully crafted perturbations to benign signals. Being almost im perceptible to humans\, these attacks pose a severe security threat to the state-of-the-art speech and speaker recognition systems\, making it vital to propose countermeasures against them. In this talk\, we focus on 1) cl assification of a given adversarial attack into attack algorithm type\, th reat model type\, and signal-to-adversarial-noise ratios\, 2) developing a novel speech denoising solution to further improve the classification per formance. \nOur proposed approach uses an x-vector network as a signature extractor to get embeddings\, which we call signatures. These signatures c ontain information about the attack and can help classify different attack algorithms\, threat models\, and signal-to-adversarial-noise ratios. We d emonstrate the transferability of such signatures to other tasks. In parti cular\, a signature extractor trained to classify attacks against speaker identification can also be used to classify attacks against speaker verifi cation and speech recognition. We also show that signatures can be used to detect unknown attacks i.e. attacks not included during training. Lastly \, we propose to improve the signature extractor by making the job of the signature extractor easier by removing the clean signal from the adversari al example (which consists of clean signal+perturbation). We train our sig nature extractor using adversarial perturbation. At inference time\, we us e a time-domain denoiser to obtain adversarial perturbation from adversari al examples. Using our improved approach\, we show that common attacks in the literature (Fast Gradient Sign Method (FGSM)\, Projected Gradient Desc ent (PGD)\, Carlini-Wagner (CW) ) can be classified with accuracy as high as 96%. We also detect unknown attacks with an equal error rate (EER) of a bout 9%\, which is very promising. DTSTART;TZID=America/New_York:20220304T120000 DTEND;TZID=America/New_York:20220304T131500 LOCATION:Ames Hall 234 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Student Seminar – Sonal Joshi “Classify and Detect Adversarial Atta cks Against Speaker and Speech Recognition Systems” URL:https://www.clsp.jhu.edu/events/student-seminar-sonal-joshi/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAdversarial attacks deceive neural network systems by adding carefully crafted perturbations to benign signals. Being almost imperceptible to humans\, these attacks pose a severe security thr eat to the state-of-the-art speech and speaker recognition systems\, makin g it vital to propose countermeasures against them. In this talk\, we focu s on 1) classification of a given adversarial attack into attack algorithm type\, threat model type\, and signal-to-adversarial-noise ratios\, 2) de veloping a novel speech denoising solution to further improve the classifi cation performance.
\nOur proposed approach uses a n x-vector network as a signature extractor to get embeddings\, which we c all signatures. These signatures contain information about the attack and can help classify different attack algorithms\, threat models\, and signal -to-adversarial-noise ratios. We demonstrate the transferability of such s ignatures to other tasks. In particular\, a signature extractor trained to classify attacks against speaker identification can also be used to class ify attacks against speaker verification and speech recognition. We also s how that signatures can be used to detect unknown attacks i.e. attacks not included during training. Lastly\, we propose to improve the signature e xtractor by making the job of the signature extractor easier by removing t he clean signal from the adversarial example (which consists of clean sign al+perturbation). We train our signature extractor using adversarial pertu rbation. At inference time\, we use a time-domain denoiser to obtain adver sarial perturbation from adversarial examples. Using our improved approach \, we show that common attacks in the literature (Fast Gradient Sign Metho d (FGSM)\, Projected Gradient Descent (PGD)\, Carlini-Wagner (CW) ) can be classified with accuracy as high as 96%. We also detect unknown attacks w ith an equal error rate (EER) of about 9%\, which is very promising.
\n X-TAGS;LANGUAGE=en-US:2022\,Joshi\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-24511@www.clsp.jhu.edu DTSTAMP:20240329T113417Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20240412T120000 DTEND;TZID=America/New_York:20240412T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Sonal Joshi (JHU) URL:https://www.clsp.jhu.edu/events/sonal-joshi-jhu/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2024\,April\,Joshi END:VEVENT END:VCALENDAR