BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-20115@www.clsp.jhu.edu DTSTAMP:20240329T141530Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nData science in small medical datasets usually means doing precision guesswork on unreliable data provided by those with high e xpectations. The first part of this talk will focus on issues that data sc ientists and engineers have to address when working with this kind of data (e.g. unreliable labels\, the effect of confounding factors\, necessity o f clinical interpretability\, difficulties with fusing more data sets). Th e second part of the talk will include some real examples of this kind of data science in the field of neurology (prediction of motor deficits in Pa rkinson’s disease based on acoustic analysis of speech\, diagnosis of Park inson’s disease dysgraphia utilising online handwriting\, exploring the Mo zart effect in epilepsy based on the music information retrieval) and psyc hology (assessment of graphomotor disabilities in children with developmen tal dysgraphia).\nBiography\nJiri Mekyska is the head of the BDALab (Brain Diseases Analysis Laboratory) at the Brno University of Technology\, wher e he leads a multidisciplinary team of researchers (signal processing engi neers\, data scientists\, neurologists\, psychologists) with a special foc us on the development of new digital endpoints and digital biomarkers enab ling to better understand\, diagnose and monitor neurodegenerative (e.g. P arkinson’s disease) and neurodevelopmental (e.g. dysgraphia) diseases. DTSTART;TZID=America/New_York:20210329T120000 DTEND;TZID=America/New_York:20210329T131500 LOCATION:via Zoom SEQUENCE:0 SUMMARY:Jiri Mekyska (Brno University of Technology) “Data Science in Small Medical Data Sets: From Logistic Regression Towards Logistic Regression” URL:https://www.clsp.jhu.edu/events/jiri-mekyska-brno-university-of-technol ogy/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nData science in small medical datasets usually means doing precision guesswork on unreliable data provided by those with high e xpectations. The first part of this talk will focus on issues that data sc ientists and engineers have to address when working with this kind of data (e.g. unreliable labels\, the effect of confounding factors\, necessity o f clinical interpretability\, difficulties with fusing more data sets). Th e second part of the talk will include some real examples of this kind of data science in the field of neurology (prediction of motor deficits in Pa rkinson’s disease based on acoustic analysis of speech\, diagnosis of Park inson’s disease dysgraphia utilising online handwriting\, exploring the Mo zart effect in epilepsy based on the music information retrieval) and psyc hology (assessment of graphomotor disabilities in children with developmen tal dysgraphia).
\nBiography
\nAbstr act
\nOver the last few years\, deep neural models have tak en over the field of natural language processing (NLP)\, brandishing great improvements on many of its sequence-level tasks. But the end-to-end natu re of these models makes it hard to figure out whether the way they repres ent individual words aligns with how language builds itself from the botto m up\, or how lexical changes in register and domain can affect the untest ed aspects of such representations.
\nIn this talk\, I will present NYTWIT\, a dataset created to challenge large language models at the lexic al level\, tasking them with identification of processes leading to the fo rmation of novel English words\, as well as with segmentation and recovery of the specific subclass of novel blends. I will then present XRayEmb\, a method which alleviates the hardships of processing these novelties by fi tting a character-level encoder to the existing models’ subword tokenizers \; and conclude with a discussion of the drawbacks of current tokenizers’ vocabulary creation schemes.
\nBiography
\nYuval Pinter
is a Senior Lecturer in the Department of Computer Science at Ben-Gurion
University of the Negev\, focusing on natural language processing. Yuval got his PhD at the Georgia Institute of Tec
hnology School of Interactive Computing as a Bloomberg Data Science PhD Fe
llow. Before that\, he worked as a Research Engineer at Yahoo Labs and as
a Computational Linguist at Ginger Software\, and obtained an MA in Lingui
stics and a BSc in CS and Mathematics\, both from Tel Aviv University.