BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-20716@www.clsp.jhu.edu DTSTAMP:20240329T130538Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nOver the last few years\, deep neural models have tak en over the field of natural language processing (NLP)\, brandishing great improvements on many of its sequence-level tasks. But the end-to-end natu re of these models makes it hard to figure out whether the way they repres ent individual words aligns with how language builds itself from the botto m up\, or how lexical changes in register and domain can affect the untest ed aspects of such representations.\nIn this talk\, I will present NYTWIT\ , a dataset created to challenge large language models at the lexical leve l\, tasking them with identification of processes leading to the formation of novel English words\, as well as with segmentation and recovery of the specific subclass of novel blends. I will then present XRayEmb\, a method which alleviates the hardships of processing these novelties by fitting a character-level encoder to the existing models’ subword tokenizers\; and conclude with a discussion of the drawbacks of current tokenizers’ vocabul ary creation schemes.\nBiography\nYuval Pinter is a Senior Lecturer in the Department of Computer Science at Ben-Gurion University of the Negev\, fo cusing on natural language processing. Yuval got his PhD at the Georgia In stitute of Technology School of Interactive Computing as a Bloomberg Data Science PhD Fellow. Before that\, he worked as a Research Engineer at Yaho o Labs and as a Computational Linguist at Ginger Software\, and obtained a n MA in Linguistics and a BSc in CS and Mathematics\, both from Tel Aviv U niversity. Yuval blogs (in Hebrew) about language matters on Dagesh Kal. DTSTART;TZID=America/New_York:20210910T120000 DTEND;TZID=America/New_York:20210910T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD SEQUENCE:0 SUMMARY:Yuval Pinter (Ben-Gurion University – Virtual Visit) “Challenging a nd Adapting NLP Models to Lexical Phenomena” URL:https://www.clsp.jhu.edu/events/yuval-pinter/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nOver the last few years\, deep neural models have tak en over the field of natural language processing (NLP)\, brandishing great improvements on many of its sequence-level tasks. But the end-to-end natu re of these models makes it hard to figure out whether the way they repres ent individual words aligns with how language builds itself from the botto m up\, or how lexical changes in register and domain can affect the untest ed aspects of such representations.
\nIn this talk\, I will present NYTWIT\, a dataset created to challenge large language models at the lexic al level\, tasking them with identification of processes leading to the fo rmation of novel English words\, as well as with segmentation and recovery of the specific subclass of novel blends. I will then present XRayEmb\, a method which alleviates the hardships of processing these novelties by fi tting a character-level encoder to the existing models’ subword tokenizers \; and conclude with a discussion of the drawbacks of current tokenizers’ vocabulary creation schemes.
\nBiography
\nYuval Pinter
is a Senior Lecturer in the Department of Computer Science at Ben-Gurion
University of the Negev\, focusing on natural language processing. Yuval got his PhD at the Georgia Institute of Tec
hnology School of Interactive Computing as a Bloomberg Data Science PhD Fe
llow. Before that\, he worked as a Research Engineer at Yahoo Labs and as
a Computational Linguist at Ginger Software\, and obtained an MA in Lingui
stics and a BSc in CS and Mathematics\, both from Tel Aviv University.
Abstr act
\nSocial media allows researchers to track societal and cultural changes over time based on language analysis tools. Many of thes e tools rely on statistical algorithms which need to be tuned to specific types of language. Recent studies have questioned the robustness of longit udinal analyses based on statistical methods due to issues of temporal bia s and semantic shift. To what extent are changes in semantics over time af fecting the reliability of longitudinal analyses? We examine this question through a case study: understanding shifts in mental health during the co urse of the COVID-19 pandemic. We demonstrate that a recently-introduced m ethod for measuring semantic shift may be used to proactively identify fai lure points of language-based models and improve predictive generalization over time. Ultimately\, we find that these analyses are critical to produ cing accurate longitudinal studies of social media.
\n X-TAGS;LANGUAGE=en-US:2022\,February\,Harrigian END:VEVENT BEGIN:VEVENT UID:ai1ec-21616@www.clsp.jhu.edu DTSTAMP:20240329T130538Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract\nSocial media allows researchers to track societal and cultural changes over time based on language analysis tools. Many of thes e tools rely on statistical algorithms which need to be tuned to specific types of language. Recent studies have shown the absence of appropriate tu ning\, specifically in the presence of semantic shift\, can hinder robustn ess of the underlying methods. However\, little is known about the practic al effect this sensitivity may have on downstream longitudinal analyses. W e explore this gap in the literature through a timely case study: understa nding shifts in depression during the course of the COVID-19 pandemic. We find that inclusion of only a small number of semantically-unstable featur es can promote significant changes in longitudinal estimates of our target outcome. At the same time\, we demonstrate that a recently-introduced met hod for measuring semantic shift may be used to proactively identify failu re points of language-based models and\, in turn\, improve predictive gene ralization. DTSTART;TZID=America/New_York:20220318T120000 DTEND;TZID=America/New_York:20220318T131500 LOCATION:Ames Hall 234 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Student Seminar – Keith Harrigian “The Problem of Semantic Shift in Longitudinal Monitoring of Social Media” URL:https://www.clsp.jhu.edu/events/student-seminar-keith-harrigian-the-pro blem-of-semantic-shift-in-longitudinal-monitoring-of-social-media/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nSocial media allows researchers to track societal and cultural changes over time based on language analysis tools. Many of thes e tools rely on statistical algorithms which need to be tuned to specific types of language. Recent studies have shown the absence of appropriate tu ning\, specifically in the presence of semantic shift\, can hinder robustn ess of the underlying methods. However\, little is known about the practic al effect this sensitivity may have on downstream longitudinal analyses. W e explore this gap in the literature through a timely case study: understa nding shifts in depression during the course of the COVID-19 pandemic. We find that inclusion of only a small number of semantically-unstable featur es can promote significant changes in longitudinal estimates of our target outcome. At the same time\, we demonstrate that a recently-introduced met hod for measuring semantic shift may be used to proactively identify failu re points of language-based models and\, in turn\, improve predictive gene ralization.
\n X-TAGS;LANGUAGE=en-US:2022\,Harrigian\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-24457@www.clsp.jhu.edu DTSTAMP:20240329T130538Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract\nAs artificial intelligence (AI) continues to rapidly expand into existing healthcare infrastructure – e.g.\, clinical decision support\, administrative tasks\, and public health surveillance – it is pe rhaps more important than ever to reflect on the broader purpose of such s ystems. While much focus has been on the potential for this technology to improve general health outcomes\, there also exists a significant\, but un derstated\, opportunity to use this technology to address health-related d isparities. Accomplishing the latter depends not only on our ability to ef fectively identify addressable areas of systemic inequality and translate them into tasks that are machine learnable\, but also our ability to measu re\, interpret\, and counteract barriers in training data that may inhibit robustness to distribution shift upon deployment (i.e.\, new populations\ , temporal dynamics). In this talk\, we will discuss progress made along b oth of these dimensions. We will begin by providing background on the stat e of AI for promoting health equity. Then\, we will present results from a recent clinical phenotyping project and discuss their implication on prev ailing views regarding language model robustness in clinical applications. Finally\, we will showcase ongoing efforts to proactively address systemi c inequality in healthcare by identifying and characterizing stigmatizing language in medical records. DTSTART;TZID=America/New_York:20240226T120000 DTEND;TZID=America/New_York:20240226T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Keith Harrigian (JHU) “Fighting Bias From Bias: Robust Natural Lang uage Processing Techniques to Promote Health Equity” URL:https://www.clsp.jhu.edu/events/keith-harrigian-jhu-fighting-bias-from- bias-robust-natural-language-processing-techniques-to-promote-health-equit y/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAs artificial intelligence (AI) continues to rapidly expand into existing healthcare infrastructure – e.g.\, clinical decision support\, administrative tasks\, and public health surveillance – it is pe rhaps more important than ever to reflect on the broader purpose of such s ystems. While much focus has been on the potential for this technology to improve general health outcomes\, there also exists a significant\, but un derstated\, opportunity to use this technology to address health-related d isparities. Accomplishing the latter depends not only on our ability to ef fectively identify addressable areas of systemic inequality and translate them into tasks that are machine learnable\, but also our ability to measu re\, interpret\, and counteract barriers in training data that may inhibit robustness to distribution shift upon deployment (i.e.\, new populations\ , temporal dynamics). In this talk\, we will discuss progress made along b oth of these dimensions. We will begin by providing background on the stat e of AI for promoting health equity. Then\, we will present results from a recent clinical phenotyping project and discuss their implication on prev ailing views regarding language model robustness in clinical applications. Finally\, we will showcase ongoing efforts to proactively address systemi c inequality in healthcare by identifying and characterizing stigmatizing language in medical records.
\n X-TAGS;LANGUAGE=en-US:2024\,February\,Harrigian END:VEVENT END:VCALENDAR