BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-20115@www.clsp.jhu.edu DTSTAMP:20240329T043859Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nData science in small medical datasets usually means doing precision guesswork on unreliable data provided by those with high e xpectations. The first part of this talk will focus on issues that data sc ientists and engineers have to address when working with this kind of data (e.g. unreliable labels\, the effect of confounding factors\, necessity o f clinical interpretability\, difficulties with fusing more data sets). Th e second part of the talk will include some real examples of this kind of data science in the field of neurology (prediction of motor deficits in Pa rkinson’s disease based on acoustic analysis of speech\, diagnosis of Park inson’s disease dysgraphia utilising online handwriting\, exploring the Mo zart effect in epilepsy based on the music information retrieval) and psyc hology (assessment of graphomotor disabilities in children with developmen tal dysgraphia).\nBiography\nJiri Mekyska is the head of the BDALab (Brain Diseases Analysis Laboratory) at the Brno University of Technology\, wher e he leads a multidisciplinary team of researchers (signal processing engi neers\, data scientists\, neurologists\, psychologists) with a special foc us on the development of new digital endpoints and digital biomarkers enab ling to better understand\, diagnose and monitor neurodegenerative (e.g. P arkinson’s disease) and neurodevelopmental (e.g. dysgraphia) diseases. DTSTART;TZID=America/New_York:20210329T120000 DTEND;TZID=America/New_York:20210329T131500 LOCATION:via Zoom SEQUENCE:0 SUMMARY:Jiri Mekyska (Brno University of Technology) “Data Science in Small Medical Data Sets: From Logistic Regression Towards Logistic Regression” URL:https://www.clsp.jhu.edu/events/jiri-mekyska-brno-university-of-technol ogy/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nData science in small medical datasets usually means doing precision guesswork on unreliable data provided by those with high e xpectations. The first part of this talk will focus on issues that data sc ientists and engineers have to address when working with this kind of data (e.g. unreliable labels\, the effect of confounding factors\, necessity o f clinical interpretability\, difficulties with fusing more data sets). Th e second part of the talk will include some real examples of this kind of data science in the field of neurology (prediction of motor deficits in Pa rkinson’s disease based on acoustic analysis of speech\, diagnosis of Park inson’s disease dysgraphia utilising online handwriting\, exploring the Mo zart effect in epilepsy based on the music information retrieval) and psyc hology (assessment of graphomotor disabilities in children with developmen tal dysgraphia).
\nBiography
\nAbstr act
\nSpeech data is notoriously difficult t o work with due to a variety of codecs\, lengths of recordings\, and meta- data formats. We present Lhotse\, a speech data representation library tha t draws upon lessons learned from Kaldi speech recognition toolkit and bri ngs its concepts into the modern deep learning ecosystem. Lhotse provides a common JSON description format with corresponding Python classes and dat a preparation recipes for over 30 popular speech corpora. Various datasets can be easily combined together and re-purposed for different tasks. The library handles multi-channel recordings\, long recordings\, local and clo ud storage\, lazy and on-the-fly operations amongst other features. We int roduce Cut and CutSet concepts\, which simplify common data wrangling task s for audio and help incorporate acoustic context of speech utterances. Fi nally\, we show how Lhotse leverages PyTorch data API abstractions and ado pts them to handle speech data for deep learning.
\nB iography
\nPiotr Zelasko is an assistant research scientist in the Center for Language and Speech Processing (CLSP) who specializes i n automatic speech recognition (ASR) and spoken language understanding (SL U). His current research focuses on applying multilingual and crosslingual speech recognition systems to categorize the phonetic inventory of a prev iously unknown language and on improving defenses against adversarial atta cks on both speaker identification and automatic speech recognition system s. He is also addressing the question of how to structure a spontaneous co nversation into high-level semantic units such as dialog acts or topics. F inally\, he is working on Lhotse + K2\, the next-generation speech process ing research software ecosystem. Before joining Johns Hopkins\, Zelasko wo rked as a machine learning consultant for Avaya (2017-2019)\, and as a mac hine learning engineer for Techmo (2015-2017). Zelasko received his PhD (2 019) in electronics engineering\, as well as his master’s (2014) and under graduate degrees (2013) in acoustic engineering from AGH University of Sci ence and Technology in Kraków\, Poland.
\n X-TAGS;LANGUAGE=en-US:2021\,October\,Zelasko END:VEVENT BEGIN:VEVENT UID:ai1ec-21615@www.clsp.jhu.edu DTSTAMP:20240329T043859Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract\n\n\nWe consider a problem of data collection for sema ntically rich NLU tasks\, where detailed semantics of documents (or uttera nces) are captured using a complex meaning representation. Previously\, d ata collection for such tasks was either handled at the cost of extensive annotator training (e.g. in FrameNet or PropBank) or simplified meaning re presentation (e.g. in QA-SRL or Overnight). In this talk\, we present two systems [1\, 2] that aim to support fast\, accurate\, and expressive sema ntic annotations by pairing human workers with a trained model in the loop .\n\nThe first system\, called Guided K-best [1]\, is an annotation toolki t for conversational semantic parsing. Instead of typing annotations from scratch\, data specialists choose a correct parse from the K-best output of a few-shot prototyped model. As the K-best list can be large (e.g. K=1 00)\, we guide the annotators’ exploration of the K-best list via explaina ble hierarchical clustering. In addition\, we experiment with RoBERTa-bas ed reranking of the K-best list to recalibrate the few-shot model towards Accuracy@K. The final system allows to annotate data up to 35% faster tha n the standard\, non-guided K-best and improves the few-shot model’s top-1 accuracy by up to 18%. The second system\, called SchemaBlocks [2]\, is an annotation toolkit for schemas\, or structured descriptions of frequent real-world scenarios (e.g.\, cooking a meal). It represents schemas in t he annotation UI as nested blocks. Using a novel Causal ARM model\, we fu rther speed up the annotation process and guide data specialists towards e xpressive and diverse schemas. As part of this work\, we collect 232 sche mas\, evaluating their internal coherence and their coverage on large-scal e newswire corpora.\n\n\n DTSTART;TZID=America/New_York:20220311T120000 DTEND;TZID=America/New_York:20220311T131500 LOCATION:Virtual Seminar SEQUENCE:0 SUMMARY:Student Seminar – Anton Belyy “Systems for Human-AI Cooperation on Collecting Semantic Annotations” URL:https://www.clsp.jhu.edu/events/student-seminar-anton-belyy-systems-for -human-ai-cooperation-on-collecting-semantic-annotations/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\n\n X-TAGS;LANGUAGE=en-US:2022\,Belyy\,March END:VEVENT END:VCALENDAR