Zero Resource Speech Technologies and Models of Early Language Acquisition

Special Mini-Research Team of the 2012 Summer Workshop

The unsupervised (zero resource) discovery of linguistic structure from speech is generating a lot of interest from two largely disjoint communities: the machine learning community is more an more interested in deploying language/speech technologies in a variety of languages/dialects with limited or no linguistic resources. The cognitive science community (psycholinguists, linguists, neurolinguists) want to understand the mechanisms by which infants spontaneously discover linguistic structure. The aim of this workshop is to bring together a team of researchers and graduate students from these two communities, to engage into mutual presentations and discussion of current and future issues. Specifically, this workshop has two aims: 1) identifying key issues and problems to be solved of interest to both communities, and 2) setting up standardized, common resources for comparing the different approaches to solving these problems (databases, evaluation criteria, software, etc). In this workshop, we will focus mainly but not exclusively on the discovery of two levels of linguistic structure: phonetic units and word-like units. We are well aware of the fact that the definition of these levels, as well as their segregation from the rest of the linguistic system is itself a matter of debate, and we welcome discussions of these issues as well.

Dates: Monday July 16 – Friday July 27

July 16th (Krieger 205): Kick-Off Symposium, Day 1

Morning 10:00a-12:30p
10:00a: Welcome and Overview of objectives [video]
10:30a: Aren Jansen (JHU/HLTCOE): Overview of Zero Resource Technology [pdf] [video]
11:30a: Dan Swingley (U. Penn): Overview of Early Language Acquisition [pdf]

Afternoon 2:30-5:30p
2:30p: Mark Johnson (Macquarie U.): Overview of Bayesian Approaches [pdf] [video]
3:30p: Bill Idsardi (U. Md): Clustering Techniques for Phonetic Categories and their Implications for Phonology [pdf]
4:30p: Emmanuel Dupoux (Ecole Normale Superieure): Modeling Language Bootstrapping: Results & Challenges [video]

July 17th (Krieger 205): Kick-Off Symposium, Day 2

Morning 9:00a-12:30p
9:00a: Naomi Feldman (U. Md): Using Bayesian Approaches to Study Human Sound and Word Learning [pdf] [video]
9:50a: Sharon Goldwater (U. Edinburgh): From Sounds to Words: Bayesian Modeling of Early Language Acquisition [pdf] [video]
10:50a: Sanjeev Khudanpur (JHU): Hybrid Dynamical System Models for Signal Segmentation and Labeling [video]
11:40a: Aren Jansen (JHU/HLTCOE): Towards a Speaker Invariant Representation of Speech [pdf] [video]

Afternoon 2:00p-5:30p
2:00p: Ian McGraw (MIT): Learning the Lexicon: A Pronunciation Mixture Model [pdf] [video]
2:50p: Rick Rose (McGill): Combining Low and High Resource Acoustic Modeling in Spoken Term Detection [pdf] [video]
3:50p: Hynek Hermansky (JHU): Dealing with Previously Unseen Unknowns in the Recognition of Speech [pdf] [video]
4:40p: Ken Church (IBM): OOVs, Pseudo-truth, and Zero Resource Methods [pdf] [video]

For select symposium abstracts, click here.

July 18th (NEB 225): Organization: Data, metrics, subprojects/subteams

July 19th-26th (NEB 225): Collaboration period, informal talks

July 27th (Hackerman B17): Final Presentation at 2:00p [pdf] [video1] [video2]

Informal Presentations

Benjamin Borschinger: Particle Filtering for Word Segmentation [pdf1] [pdf2]

Mark Johnson: Bayesian Methods Tutorial [pdf]

Hynek Hermansky: Acoustic Processing Tutorial [pdf]

Shinji Watanabe: Integrated Bayesian Unsupervised Acoustic, Lexical, and Language Models [pdf]

Pascal Clark: Rythmic Demodulation for Zero-Resource Speech Recognition [pdf]

Jason Eisner: Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model [pdf]

Team Members
Senior Members
Ken Church	IBM Research
Pascal Clark	Johns Hopkins University, HLTCOE
Emmanuel Dupoux	Ecole Normale Superieure
Naomi Feldman	University of Maryland
Sharon Goldwater	University of Edinburgh
Hynek Hermansky	Johns Hopkins University
Aren Jansen	Johns Hopkins University, HLTCOE
Mark Johnson	Macquarie University
Sanjeev Khudanpur	Johns Hopkins University
Ian McGraw	Massachusetts Institute of Technology
Richard Rose	McGill University
Graduate Students
Erin Bennett	University of Maryland
Benjamin Börschinger	Macquarie University
Justin Chiu	Carnegie Mellon University
Ewan Dunbar	University of Maryland
Abdallah Fourtassi	Ecole Normale Superieure
David Harwath	Massachusetts Institute of Technology
Keith Levin	Johns Hopkins University
Atta Norouzian	McGill University
Vijay Peddinti	Johns Hopkins University
Rachel Richardson	University of Maryland
Thomas Schatz	Ecole Normale Superieure
Yuriy Shames	Johns Hopkins University
Samuel Thomas	Johns Hopkins University
Affiliate Members
Jason Eisner	Johns Hopkins University
Steven Greenberg	Transparent Language
Timothy (TJ) Hazen	MIT Lincoln Laboratory
Florian Metze	Carnegie Mellon University
Mike Seltzer	Microsoft Research
Dan Swingley	University of Pennsylvania
Balakrishnan Varadarajan	Google
Shinji Watanabe	Mitsubishi Electronics Research Lab

Zero Resource Speech Technologies and Models of Early Language Acquisition

Dates: Monday July 16 – Friday July 27

Informal Presentations

Upcoming Seminars

Center for Language and Speech Processing