BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-21068@www.clsp.jhu.edu DTSTAMP:20240330T042346Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20211203T120000 DTEND;TZID=America/New_York:20211203T131500 LOCATION:Hackerman HallB17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Eric Ringger (Zillow Group) URL:https://www.clsp.jhu.edu/events/eric-ringger-zillow-group/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2021\,December\,Ringger END:VEVENT BEGIN:VEVENT UID:ai1ec-21072@www.clsp.jhu.edu DTSTAMP:20240330T042346Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nAbstract
\nOne of the keys to success in machine learning applications is to improve each user’s personal exper ience via personalized models. A personalized model can be a more resource -efficient solution than a general-purpose model\, too\, because it focuse s on a particular sub-problem\, for which a smaller model architecture can be good enough. However\, training a personalized model requires data fro m the particular test-time user\, which are not always available due to th eir private nature and technical challenges. Furthermore\, such data tend to be unlabeled as they can be collected only during the test time\, once after the system is deployed to user devices. One could rely on the genera lization power of a generic model\, but such a model can be too computatio nally/spatially complex for real-time processing in a resource-constrained device. In this talk\, I will present som e techniques to circumvent the lack of labeled personal data in the contex t of speech enhancement. Our machine learning models will require zero or few data samples from the test-time users\, while they can still achieve t he personalization goal. To this end\, we will investigate modularized spe ech enhancement models as well as the potential of self-supervised learnin g for personalized speech enhancement. Because our research achieves the p ersonalization goal in a data- and resource-efficient way\, it is a step t owards a more available and affordable AI for society.
\nBio graphy
\nMinje Kim is an associate professor in the Dept. of Intellig ent Systems Engineering at Indiana University\, where he leads his researc h group\, Signals and AI Group in Engineering (SAIGE). He is also an Amazo n Visiting Academic\, consulting for Amazon Lab126. At IU\, he is affiliat ed with various programs and labs such as Data Science\, Cognitive Science \, Dept. of Statistics\, and Center for Machine Learning. He earned his Ph .D. in the Dept. of Computer Science at the University of Illinois at Urba na-Champaign. Before joining UIUC\, He worked as a researcher at ETRI\, a national lab in Korea\, from 2006 to 2011. Before then\, he received his M aster’s and Bachelor’s degrees in the Dept. of Computer Science and Engine ering at POSTECH (Summa Cum Laude) and in the Division of Information and Computer Engineering at Ajou University (w ith honor) in 2006 and 2004\, respectively. He is a recipient of various a wards including NSF Career Award (2021)\, IU Trustees Teaching Award (2021 )\, IEEE SPS Best Paper Award (2020)\, and Google and Starkey’s grants for outstanding student papers in ICASSP 2013 and 2014\, respectively. He is an IEEE Senior Member and also a member of the IEEE Audio and Acoustic Sig nal Processing Technical Committee (2018-2023). He is serving as an Associ ate Editor for EURASIP Journal of Audio\, Speech\, and Music Processing\, and as a Consulting Associate Editor for IEEE Open Journal of Signal Proce ssing. He is also a reviewer\, program committee member\, or area chair fo r the major machine learning and signal processing. He filed more than 50 patent applications as an inventor.
DTSTART;TZID=America/New_York:20221202T120000 DTEND;TZID=America/New_York:20221202T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Minje Kim (Indiana University) “Personalized Speech Enhancement: Da ta- and Resource-Efficient Machine Learning” URL:https://www.clsp.jhu.edu/events/minje-kim-indiana-university/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,December\,Kim END:VEVENT BEGIN:VEVENT UID:ai1ec-22422@www.clsp.jhu.edu DTSTAMP:20240330T042346Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nZipf’s law is commonly glo ssed by the aphorism “infrequent words are frequent\,” but in practice\, i t has often meant that there are three types of words: frequent\, infreque nt\, and out-of-vocabulary (OOV). Speech recognition solved the problem of frequent words in 1970 (with dynamic time warping). Hidden Markov models worked well for moderately infrequent words\, but the problem of OOV word s was not solved until sequence-to-sequence neural nets de-reified the con cept of a word. Many other social phenomena follow power-law distribution s. The number of native speakers of the N’th most spoken language\, for e xample\, is 1.44 billion over N to the 1.09. In languages with sufficient data\, we have shown that monolingual pre-training outperforms multilingu al pre-training. In less-frequent languages\, multilingual knowledge tran sfer can significantly reduce phone error rates. In languages with no tra ining data\, unsupervised ASR methods can be proven to converge\, as long as the eigenvalues of the language model are sufficiently well separated t o be measurable. Other systems of social categorization may follow similar power-law distributions. Disability\, for example\, can cause speech pat terns that were never seen in the training database\, but not all disabili ties need do so. The inability of speech technology to work for people wi th even common disabilities is probably caused by a lack of data\, and can probably be solved by finding better modes of interaction between technol ogy researchers and the communities served by technology.
\nBiography
\nMark Hasegawa-Johnson is a William L. Everitt F aculty Fellow of Electrical and Computer Engineering at the University of Illinois in Urbana-Champaign. He has published research in speech product ion and perception\, source separation\, voice conversion\, and low-resour ce automatic speech recognition.
DTSTART;TZID=America/New_York:20221209T120000 DTEND;TZID=America/New_York:20221209T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Mark Hasegawa-Johnson (University of Illinois Urbana-Champaign) “Zi pf’s Law Suggests a Three-Pronged Approach to Inclusive Speech Recognition ” URL:https://www.clsp.jhu.edu/events/mark-hasegawa-johnson-university-of-ill inois-urbana-champaign/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,December\,Hasegawa-Johnson END:VEVENT BEGIN:VEVENT UID:ai1ec-24167@www.clsp.jhu.edu DTSTAMP:20240330T042346Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nPre-trained speech represe ntation models have become ubiquitous in speech processing over the past f ew years. They have both improved the state of the art and made it feasib le to learn task-specific models with very little labeled data. However\, it is not well understood what linguistic information is encoded in pre-t rained models and how best to apply them to downstream tasks. In this talk I will describe recent work that begins to build an understanding of the layer-wise information learned by pre-trained speech models. We consider a number of popular pre-trained models and investigate the extent to which their layers encode spectral\, phonetic\, and word-level information. Th e results of these analyses also suggest some ways to improve or simplify the application of pre-trained models for downstream tasks. Finally\, I w ill describe our efforts to benchmark model performance on a variety of sp oken language understanding tasks\, in order to broaden our understanding of the capabilities of state-of-the-art models.
\nThis talk is based in part on work presented in
\nA. Pasad et al.\, “Comparative layer-wise analysis of self-supervis ed speech models\,”ICASSP 2023.
\nA. Pasad et al.\, “What do self-supervised speech models know about words?\,” arXiv:2307.00162\, 2023.
\nS. Shon et al.\, “SLUE Phase-2: A Ben chmark Suite of Diverse Spoken Language Understanding Tasks\,” ACL 202 3.
\nBio
\nKaren Livescu is a Professor at TT I-Chicago. She completed her PhD at MIT in 2005. She is an ISCA Fellow and a recent IEEE Distinguished Lecturer. She has served as a program chair/ co-chair for ICLR\, Interspeech\, and ASRU\, and is an Associate Editor fo r TACL and IEEE T-PAMI. Her group’s work spans a variety of topics in spo ken\, written\, and signed language processing.
DTSTART;TZID=America/New_York:20231201T120000 DTEND;TZID=America/New_York:20231201T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Karen Livescu (Toyota Technological Institute at Chicago) “What Do Pre-Trained Speech Representation Models Know? Layer-Wise Analysis and Ben chmarking” URL:https://www.clsp.jhu.edu/events/karen-livescu-toyota-technological-inst itute-at-chicago/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,December\,Livescu END:VEVENT BEGIN:VEVENT UID:ai1ec-24169@www.clsp.jhu.edu DTSTAMP:20240330T042346Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nFoundation models\, includ ing Chat-GPT and its many variants\, have come into prominence in the natu ral language processing (NLP) community thanks the ubiquity of text data r eadily available on the internet and the design of modern transformer arch itectures that can effectively learn from such data. However\, the develop ment of a foundation model for sequential decision-making (e.g.\, reinforc ement learning\, planning) is faced with additional challenges not present in NLP. In this talk\, we discuss some of these challenges with the hope of informing future investments that funding agencies and the academic com munity should engage in. The problem of transfer learning in the context o f sequential decision-making is also discussed and constitutes one of the challenges that foundation models must address.
\nBio
\nAlvaro Velasquez a program manager at the D efense Advanced Research Projects Agency (DARPA)\, where he currently lead s programs on neuro-symbolic AI. Before that\, Alvaro oversaw the machine intelligence portfolio for the Information Directorate of the Air Force Re search Laboratory (AFRL). Alvaro is a recipient of the distinguished paper award from AAAI and best paper and patent awards from AFRL\, the National Science Foundation Graduate Research Fellowship. He has authored over 70 papers and two patents and serves as Associate Editor of the IEEE Transact ions on Artificial Intelligence.
DTSTART;TZID=America/New_York:20231204T120000 DTEND;TZID=America/New_York:20231204T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Alvaro Velasquez (DARPA) “Foundation Models and the Transfer of Emb odied Autonomy” URL:https://www.clsp.jhu.edu/events/alvaro-velasquez/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,December\,Velasquez END:VEVENT END:VCALENDAR