BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-20120@www.clsp.jhu.edu DTSTAMP:20240329T131935Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nRobotics@Google’s mission is to make robots useful in the real world through machine learning. We a re excited about a new model for robotics\, designed for generalization ac ross diverse environments and instructions. This model is focused on scala ble data-driven learning\, which is task-agnostic\, leverages simulation\, learns from past experience\, and can be quickly adapted to work in the r eal-world through limited interactions. In this talk\, we’ll share some of our recent work in this direction in both manipulation and locomotion app lications.
\nBiography
\nCarolina
Abstract
\n\n\n\n\nAutomatic discovery of phon e or word-like units is one of the core objectives in zero-resource speech processing. Recent attempts employ contrastive predictive coding (CPC)\, where the model learns representations by predicting the next frame given past context. However\, CPC only looks at the audio signal’s structure at the frame level. The speech structure exists beyond frame-level\, i.e.\, a t phone level or even higher. We propose a segmental contrastive predictiv e coding (SCPC) framework to learn from the signal structure at both the f rame and phone levels.\n\n\nSCPC is a hierarchical model with three stages trained in an end-to-end m anner. In the first stage\, the model predicts future feature frames and e xtracts frame-level representation from the raw waveform. In the second st age\, a differentiable boundary detector finds variable-length segments. I n the last stage\, the model predicts future segments to learn segment rep resentations. Experiments show that our model outperforms existing phone a nd word segmentation methods on TIMIT and Buckeye datasets.