BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-23888@www.clsp.jhu.edu DTSTAMP:20240328T131619Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract\nEmbedding text sequences is a widespread requirement in modern language understanding. Existing approaches focus largely on con stant-size representations. This is problematic\, as the amount of informa tion contained in text often varies with the length of the input. We propo se a solution called Nugget\, which encodes language into a representation based on a dynamically selected subset of input tokens. These nuggets are learned through tasks like autoencoding and machine translation\, and int uitively segment language into meaningful units. We demonstrate Nugget out performs related approaches in tasks involving semantic comparison. Finall y\, we illustrate these compact units allow for expanding the contextual w indow of a language model (LM)\, suggesting new future LMs that can condit ion on significantly larger amounts of content. DTSTART;TZID=America/New_York:20230911T120000 DTEND;TZID=America/New_York:20230911T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Student Seminar – Guanghui Qin “Nugget: Neural Agglomerative Embedd ings of Text (ICML 2023)” URL:https://www.clsp.jhu.edu/events/student-seminar-guanghui-qin/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nEmbedding text sequ ences is a widespread requirement in modern language understanding. Existi ng approaches focus largely on constant-size representations. This is prob lematic\, as the amount of information contained in text often varies with the length of the input. We propose a solution called Nugget\, which enco des language into a representation based on a dynamically selected subset of input tokens. These nuggets are learned through tasks like autoencoding and machine translation\, and intuitively segment language into meaningfu l units. We demonstrate Nugget outperforms related approaches in tasks inv olving semantic comparison. Finally\, we illustrate these compact units al low for expanding the contextual window of a language model (LM)\, suggest ing new future LMs that can condition on significantly larger amounts of c ontent.
\n X-TAGS;LANGUAGE=en-US:2023\,Qin\,September END:VEVENT BEGIN:VEVENT UID:ai1ec-23898@www.clsp.jhu.edu DTSTAMP:20240328T131619Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract\nAny valuable NLP dataset has traditionally been shipp ed with crowdsourced categorical labels. Instructions for collecting these labels are easy to communicate and the labels themselves are easy to anno tate. However\, as self-supervision based methods are getting better at ba sically everything\, human annotations may need to provide more nuanced su pervision or enable more detailed evaluation in order to be worth further collecting. One natural extension to existing categorical annotation schem es is to obtain uncertainty information beyond a single hard label. In thi s talk\, I will discuss my recent efforts on introducing scalar labels in place of categorical labels as a form of uncertainty annotation. We demons trate that\, compared to other more obvious annotation schemes for eliciti ng uncertainty information\, scalar labels are significantly more cost-eff ective to annotate\, provide reliable evaluation\, and have a theoretical connection to existing predictive uncertainty metrics. In particular\, the y motivate using other losses as surrogates for calibration evaluation. DTSTART;TZID=America/New_York:20230929T120000 DTEND;TZID=America/New_York:20230929T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:CLSP Student Seminar – Zhengping Jiang “Scalar Labels for Capturing Human Uncertainty” URL:https://www.clsp.jhu.edu/events/clsp-student-seminar-zhengping-jiang/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAny valuable NLP d ataset has traditionally been shipped with crowdsourced categorical labels . Instructions for collecting these labels are easy to communicate and the labels themselves are easy to annotate. However\, as self-supervision bas ed methods are getting better at basically everything\, human annotations may need to provide more nuanced supervision or enable more detailed evalu ation in order to be worth further collecting. One natural extension to ex isting categorical annotation schemes is to obtain uncertainty information beyond a single hard label. In this talk\, I will discuss my recent effor ts on introducing scalar labels in place of categorical labels as a form o f uncertainty annotation. We demonstrate that\, compared to other more obv ious annotation schemes for eliciting uncertainty information\, scalar lab els are significantly more cost-effective to annotate\, provide reliable e valuation\, and have a theoretical connection to existing predictive uncer tainty metrics. In particular\, they motivate using other losses as surrog ates for calibration evaluation.
\n X-TAGS;LANGUAGE=en-US:2023\,Jiang\,September END:VEVENT END:VCALENDAR