BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-22403@www.clsp.jhu.edu DTSTAMP:20240328T144414Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nVoice conversion (VC) is a significant aspect of arti ficial intelligence. It is the study of how to convert one’s voice to soun d like that of another without changing the linguistic content. Voice conv ersion belongs to a general technical field of speech synthesis\, which co nverts text to speech or changes the properties of speech\, for example\, voice identity\, emotion\, and accents. Voice conversion involves multiple speech processing techniques\, such as speech analysis\, spectral convers ion\, prosody conversion\, speaker characterization\, and vocoding. With t he recent advances in theory and practice\, we are now able to produce hum an-like voice quality with high speaker similarity. In this talk\, Dr. Sis man will present the recent advances in voice conversion and discuss their promise and limitations. Dr. Sisman will also provide a summary of the av ailable resources for expressive voice conversion research.\nBiography\nDr . Berrak Sisman (Member\, IEEE) received the Ph.D. degree in electrical an d computer engineering from National University of Singapore in 2020\, ful ly funded by A*STAR Graduate Academy under Singapore International Graduat e Award (SINGA). She is currently working as a tenure-track Assistant Prof essor at the Erik Jonsson School Department of Electrical and Computer Eng ineering at University of Texas at Dallas\, United States. Prior to joinin g UT Dallas\, she was a faculty member at Singapore University of Technolo gy and Design (2020-2022). She was a Postdoctoral Research Fellow at the N ational University of Singapore (2019-2020). She was an exchange doctoral student at the University of Edinburgh and a visiting scholar at The Centr e for Speech Technology Research (CSTR)\, University of Edinburgh (2019). She was a visiting researcher at RIKEN Advanced Intelligence Project in Ja pan (2018). Her research is focused on machine learning\, signal processin g\, emotion\, speech synthesis and voice conversion.\nDr. Sisman has serve d as the Area Chair at INTERSPEECH 2021\, INTERSPEECH 2022\, IEEE SLT 2022 and as the Publication Chair at ICASSP 2022. She has been elected as a me mber of the IEEE Speech and Language Processing Technical Committee (SLTC) in the area of Speech Synthesis for the term from January 2022 to Decembe r 2024. She plays leadership roles in conference organizations and active in technical committees. She has served as the General Coordinator of the Student Advisory Committee (SAC) of International Speech Communication Ass ociation (ISCA). DTSTART;TZID=America/New_York:20221104T120000 DTEND;TZID=America/New_York:20221104T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Berrak Sisman (University of Texas at Dallas) “Speech Synthesis and Voice Conversion: Machine Learning can Mimic Anyone’s Voice” URL:https://www.clsp.jhu.edu/events/berrak-sisman-university-of-texas-at-da llas/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nVoice conversion (VC) is a significant aspect of arti ficial intelligence. It is the study of how to convert one’s voice to soun d like that of another without changing the linguistic content. Voice conv ersion belongs to a general technical field of speech synthesis\, which co nverts text to speech or changes the properties of speech\, for example\, voice identity\, emotion\, and accents. Voice conversion involves multiple speech processing techniques\, such as speech analysis\, spectral convers ion\, prosody conversion\, speaker characterization\, and vocoding. With t he recent advances in theory and practice\, we are now able to produce hum an-like voice quality with high speaker similarity. In this talk\, Dr. Sis man will present the recent advances in voice conversion and discuss their promise and limitations. Dr. Sisman will also provide a summary of the av ailable resources for expressive voice conversion research.
\nDr. Berrak Sisman (Member\, IEEE) received th e Ph.D. degree in electrical and computer engineering from National Univer sity of Singapore in 2020\, fully funded by A*STAR Graduate Academy under Singapore International Graduate Award (SINGA). She is currently working a s a tenure-track Assistant Professor at the Erik Jonsson School Department of Electrical and Computer Engineering at University of Texas at Dallas\, United States. Prior to joining UT Dallas\, she was a faculty member at S ingapore University of Technology and Design (2020-2022). She was a Postdo ctoral Research Fellow at the National University of Singapore (2019-2020) . She was an exchange doctoral student at the University of Edinburgh and a visiting scholar at The Centre for Speech Technology Research (CSTR)\, U niversity of Edinburgh (2019). She was a visiting researcher at RIKEN Adva nced Intelligence Project in Japan (2018). Her research is focused on mach ine learning\, signal processing\, emotion\, speech synthesis and voice co nversion.
\nDr. Sisman has served as the Area Chair at INTERSPEECH 2 021\, INTERSPEECH 2022\, IEEE SLT 2022 and as the Publication Chair at ICA SSP 2022. She has been elected as a member of the IEEE Speech and Language Processing Technical Committee (SLTC) in the area of Speech Synthesis for the term from January 2022 to December 2024. She plays leadership roles i n conference organizations and active in technical committees. She has ser ved as the General Coordinator of the Student Advisory Committee (SAC) of International Speech Communication Association (ISCA).
\n X-TAGS;LANGUAGE=en-US:2022\,November\,Sisman END:VEVENT BEGIN:VEVENT UID:ai1ec-22422@www.clsp.jhu.edu DTSTAMP:20240328T144414Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nZipf’s law is commonly glossed by the aphorism “infre quent words are frequent\,” but in practice\, it has often meant that ther e are three types of words: frequent\, infrequent\, and out-of-vocabulary (OOV). Speech recognition solved the problem of frequent words in 1970 (wi th dynamic time warping). Hidden Markov models worked well for moderately infrequent words\, but the problem of OOV words was not solved until sequ ence-to-sequence neural nets de-reified the concept of a word. Many other social phenomena follow power-law distributions. The number of native sp eakers of the N’th most spoken language\, for example\, is 1.44 billion ov er N to the 1.09. In languages with sufficient data\, we have shown that monolingual pre-training outperforms multilingual pre-training. In less-f requent languages\, multilingual knowledge transfer can significantly redu ce phone error rates. In languages with no training data\, unsupervised A SR methods can be proven to converge\, as long as the eigenvalues of the l anguage model are sufficiently well separated to be measurable. Other syst ems of social categorization may follow similar power-law distributions. Disability\, for example\, can cause speech patterns that were never seen in the training database\, but not all disabilities need do so. The inabi lity of speech technology to work for people with even common disabilities is probably caused by a lack of data\, and can probably be solved by find ing better modes of interaction between technology researchers and the com munities served by technology.\nBiography\nMark Hasegawa-Johnson is a Will iam L. Everitt Faculty Fellow of Electrical and Computer Engineering at th e University of Illinois in Urbana-Champaign. He has published research i n speech production and perception\, source separation\, voice conversion\ , and low-resource automatic speech recognition. DTSTART;TZID=America/New_York:20221209T120000 DTEND;TZID=America/New_York:20221209T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Mark Hasegawa-Johnson (University of Illinois Urbana-Champaign) “Zi pf’s Law Suggests a Three-Pronged Approach to Inclusive Speech Recognition ” URL:https://www.clsp.jhu.edu/events/mark-hasegawa-johnson-university-of-ill inois-urbana-champaign/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nZipf’s law is commonly glossed by the aphorism “infre quent words are frequent\,” but in practice\, it has often meant that ther e are three types of words: frequent\, infrequent\, and out-of-vocabulary (OOV). Speech recognition solved the problem of frequent words in 1970 (wi th dynamic time warping). Hidden Markov models worked well for moderately infrequent words\, but the problem of OOV words was not solved until sequ ence-to-sequence neural nets de-reified the concept of a word. Many other social phenomena follow power-law distributions. The number of native sp eakers of the N’th most spoken language\, for example\, is 1.44 billion ov er N to the 1.09. In languages with sufficient data\, we have shown that monolingual pre-training outperforms multilingual pre-training. In less-f requent languages\, multilingual knowledge transfer can significantly redu ce phone error rates. In languages with no training data\, unsupervised A SR methods can be proven to converge\, as long as the eigenvalues of the l anguage model are sufficiently well separated to be measurable. Other syst ems of social categorization may follow similar power-law distributions. Disability\, for example\, can cause speech patterns that were never seen in the training database\, but not all disabilities need do so. The inabi lity of speech technology to work for people with even common disabilities is probably caused by a lack of data\, and can probably be solved by find ing better modes of interaction between technology researchers and the com munities served by technology.
\nBiography
\nMark Hasegawa-Johnson is a William L. Everitt Faculty Fellow of Electrical and Computer Engineering at the University of Illinois in Urbana-Champaig n. He has published research in speech production and perception\, source separation\, voice conversion\, and low-resource automatic speech recogni tion.
\n X-TAGS;LANGUAGE=en-US:2022\,December\,Hasegawa-Johnson END:VEVENT END:VCALENDAR