Archived Seminars by Year

Show all Seminars     Only show seminars with video

2012

January 31, 2012

“Scalable Topic Models”   Video Available

David Blei, Princeton University

[abstract] [biography]

Abstract

Probabilistic topic modeling provides a suite of tools for analyzing large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. We can use topic models to explore the thematic structure of a corpus and to solve a variety of prediction problems about documents.At the center of a topic model is a hierarchical mixed-membership model, where each document exhibits a shared set of mixture components with individual (per-document) proportions. Our goal is to condition on the observed words of a collection and estimate the posterior distribution of the shared components and per-document proportions. When analyzing modern corpora, this amounts to posterior inference with billions of latent variables.How can we cope with such data?  In this talk, I will describe stochastic variational inference, an algorithm for computing with topic models that can handle very large document collections and even endless streams of documents. I will demonstrate the algorithm with models fitted to millions of articles. I will show how stochastic variational inference can be generalized to many kinds of hierarchical models. I will highlight several open questions and outstanding issues.(This is joint work with Francis Bach, Matt Hoffman, John Paisley, and Chong Wang.)

Speaker Biography

David Blei is an associate professor of Computer Science at Princeton University. His research interests include probabilistic topic models, graphical models, approximate posterior inference, and Bayesian nonparametrics.

February 7, 2012

“Extending the search space of the Minimum Bayes-Risk Decoder for Machine Translation”

Shankar Kumar, Google

[abstract] [biography]

Abstract

A Minimum Bayes-Risk (MBR) decoder seeks the hypothesis with the least expected loss function for a given task. In the field of machine translation, the technique was originally developed for rescoring k-best lists of hypotheses generated by a statistical model. In this talk, I will present our work on extending the search space of the MBR decoder to very large lattices and hypergraphs that contain on an average about 10^81 hypotheses! I will describe conditions on the loss function that enable efficient implementation of the decoder on such large search spaces. I will focus on the BLEU score (Papineni et. al.) as the loss function for machine translation. To satisfy the conditions on the loss function, I will introduce a linear approximation to the BLEU score. The MBR decoder under linearized BLEU can be easily implemented using Weighted Finite State Transducers. However, the resulting procedure is computationally expensive for a moderately large lattice. The costly step is the computation of n-gram posterior probabilities. I will next present an approximate algorithm which is much faster than our Weighted Finite State Transducer approach. This algorithm extends to translation hypergraphs generated by systems based on synchronous context free grammars. Inspired by work in speech recognition, I will finally present an exact and yet efficient algorithm to compute n-gram posteriors on both lattices and hypergraphs. The linear approximation to BLEU contains parameters which were initially derived from n-gram precisions seen on our development data. I will describe how we employed Minimum Error Rate training (MERT) to estimate these parameters. In the final part of the talk, I will describe an MBR inspired scheme to learn a consensus model over the n-gram features of multiple underlying component models. This scheme works on a collection of hypergraphs or lattices produced by syntax or phrase based translation systems. MERT is used to train the parameters. The approach outperforms a pipeline of MBR decoding followed by standard system combination while using less total computation. This is joint work with Wolfgang Macherey, Roy Tromble, Chris Dyer, John DeNero, Franz Och and Ciprian Chelba.

Speaker Biography

Shankar Kumar is a researcher in the speech group at Google. Prior to this, he worked in the Google’s effort on language translation. His current interests are in statistical methods for language processing with a particular emphasis on speech recognition and translation.

February 17, 2012

“Learning to Read the Web”   Video Available

Tom Mitchell, Carnegie Mellon University

[abstract] [biography]

Abstract

We describe our efforts to build a Never-Ending Language Learner (NELL) that runs 24 hours per day, forever, learning to read the web.  Each day NELL extracts (reads) more facts from the web, and integrates these into its growing knowledge base of beliefs.  Each day NELL also learns to read better than yesterday, enabling it to go back to the text it read yesterday, and extract more facts, more accurately.NELL has now been running 24 hours/day for over two years.  The result so far is a collection of 15 million interconnected beliefs (e.g., servedWtih(coffee, applePie), isA(applePie, bakedGood) ), that NELL is considering at different levels of confidence, along with hundreds of thousands of learned phrasings, morphoogical features, and web page structures that NELL uses to extract beliefs from the web.The approach implemented by NELL is based on three key ideas: (1) coupling the semi-supervised training of thousands of different functions that extract different types of information from different web sources, (2) automatically discovering new constraints that more tightly couple the training of these functions over time, and (3) a curriculum or sequence of increasing difficult learning tasks.  Track NELL's progress at http://rtw.ml.cmu.edu.

Speaker Biography

Tom M. Mitchell is the E. Fredkin University Professor and founding head of the Machine Learning Department at Carnegie Mellon University. His research interests lie in machine learning, artificial intelligence, and cognitive neuroscience.  Mitchell is a member of the U.S. National Academy of Engineering, a Fellow of the American Association for the Advancement of Science (AAAS), and a Fellow and Past President of the Association for the Advancement of Artificial Intelligence (AAAI).  Mitchell believes the field of machine learning will be the fastest growing branch of computer science during the 21st century.  His web page is http://www.cs.cmu.edu/~tom.

February 21, 2012

“Bayesian Nonparametric Methods for Complex Dynamical Phenomena”   Video Available

Emily Fox, University of Pennsylvania

[abstract] [biography]

Abstract

  Markov switching processes, such as hidden Markov models (HMMs) and switching linear dynamical systems (SLDSs), are often used to describe rich classes of dynamical phenomena.  They describe complex temporal behavior via repeated returns to a set of simpler models: imagine, for example, a person alternating between walking, running and jumping behaviors, or a stock index switching between regimes of high and low volatility.Traditional modeling approaches for Markov switching processes typically assume a fixed, pre-specified number of dynamical models.  Here, in contrast, I develop Bayesian nonparametric approaches that define priors on an unbounded number of potential Markov models.  Using stochastic processes including the beta and Dirichlet process, I develop methods that allow the data to define the complexity of inferred classes of models, while permitting efficient computational algorithms for inference.  The new methodology also has generalizations for modeling and discovery of dynamic structure shared by multiple related time series.Interleaved throughout the talk are results from studies of the NIST speaker diarization database, stochastic volatility of a stock index, the dances of honeybees, and human motion capture videos.  

Speaker Biography

Emily B. Fox received the S.B. degree in 2004, M.Eng. degree in 2005, and E.E. degree in 2008 from the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT). She is currently an assistant professor in the Wharton Statistics Department at the University of Pennsylvania. Her Ph.D. was advised by Prof. Alan Willsky in the Stochastic Systems Group, and she recently completed a postdoc in the Department of Statistical Science at Duke University working with Profs. Mike West and David Dunson. Emily is a recipient of the National Defense Science and Engineering Graduate (NDSEG), National Science Foundation (NSF) Graduate Research fellowships, and NSF Mathematical Sciences Postdoctoral Research Fellowship. She has also been awarded the 2009 Leonard J. Savage Thesis Award in Applied Methodology, the 2009 MIT EECS Jin-Au Kong Outstanding Doctoral Thesis Prize, the 2005 Chorafas Award for superior contributions in research, and the 2005 MIT EECS David Adler Memorial 2nd Place Master's Thesis Prize. Her research interests are in multivariate time series analysis and Bayesian nonparametric methods.

March 2, 2012

“Efficient Search and Learning for Language Understanding and Translation”   Video Available

Liang Huang, Information Sciences Institute/ University of Southern California

[abstract] [biography]

Abstract

What is in common between translating from English into Chinese and compiling C++ into machine code? And yet what are the differences that make the former so much harder for computers? How can computers learn from human translators? This talk sketches an efficient (linear-time) "understanding + rewriting" paradigm for machine translation inspired by both human translators as well as compilers. In this paradigm, a source language sentence is first parsed into a syntactic tree, which is then recursively converted into a target language sentence via tree-to-string rewriting rules. In both "understanding" and "rewriting" stages, this paradigm closely resembles the efficiency and incrementality of both human processing and compiling. We will discuss these two stages in turn. First, for the "understanding" part, we present a linear-time approximate dynamic programming algorithm for incremental parsing that is as accurate as those much slower (cubic-time) chart parsers, while being as fast as those fast but lossy greedy parsers, thus getting the advantages of both worlds for the first time, achieving state-of-the-art speed and accuracy. But how do we efficiently learn such a parsing model with approximate inference from huge amounts of data? We propose a general framework for structured prediction based on the structured perceptron that is guaranteed to succeed with inexact search and works well in practice. Next, the "rewriting" stage translates these source-language parse trees into the target language. But parsing errors from the previous stage adversely affect translation quality. An obvious solution is to use the top-k parses, rather than the 1-best tree, but this only helps a little bit due to the limited scope of the k-best list. We instead propose a "forest-based approach", which translates a packed forest encoding *exponentially* many parses in a polynomial space by sharing common subtrees. Large-scale experiments showed very significant improvements in terms of translation quality, which outperforms the leading systems in literature. Like the "understanding" part, the translation algorithm here is also linear-time and incremental, thus resembles human translation. We conclude by drawing a few future directions.

Speaker Biography

Liang Huang is a Research Assistant Professor at University of Southern California (USC), and a Research Scientist at USC's Information Sciences Institute (ISI). He received his PhD from the University of Pennsylvania in 2008, and worked as a Research Scientist at Google before moving to USC/ISI. His research focuses on efficient search algorithms for natural language processing, esp. in parsing and machine translation, as well as related structured learning problems. His work received a Best Paper Award at ACL 2008, and three Best Paper Nominations at ACL 2007, EMNLP 2008, and ACL 2010.

March 6, 2012

“Fast, Accurate and Robust Multilingual Syntactic Analysis”   Video Available

Slav Petrov, Google

[abstract] [biography]

Abstract

To build computer systems that can 'understand' natural language, we need to go beyond bag-of-words models and take the grammatical structure of language into account. Part-of-speech tag sequences and dependency parse trees are one form of such structural analysis thatis easy to understand and use. This talk will cover three topics. First, I will present a coarse-to-fine architecture for dependency parsing that uses linear-time vine pruning and structured prediction cascades. The resulting pruned third-order model is twice as fast as an unpruned first-order model and compares favorably to a state-of-the-art transition-based parser in terms of speed and accuracy. I will then present a simple online algorithm for training structured prediction models with extrinsic loss functions. By tuning a parser with a loss function for machine translation reordering, we can show that parsing accuracy matters for downstream application quality, producing improvements of more than 1 BLEU point on an end-to-end machine translation task. Finally, I will present approaches for projecting part-of-speech taggers and syntactic parsers across language boundaries, allowing us to build models for languages with no labeled training data. Our projected models significantly outperform state-of-the-art unsupervised models and constitute a first step towards an universal parser. This is joint work with Ryan McDonald, Keith Hall, Dipanjan Das, Alexander Rush, Michael Ringgaard and Kuzman Ganchev (a.k.a. the Natural Language Parsing Team at Google).

Speaker Biography

Slav Petrov is a Senior Research Scientist in Google's New York office. He works on problems at the intersection of natural language processing and machine learning. He is in particular interested in syntactic parsing and its applications to machine translation and information extraction. He also teaches a class on Statistical Natural Language Processing at New York University every Fall. Prior to Google, Slav completed his PhD degree at UC Berkeley, where he worked with Dan Klein. He holds a Master's degree from the Free University of Berlin, and also spent a year as an exchange student at Duke University. Slav was a member of the FU-Fighters team that won the RoboCup 2004 world championship in robotic soccer and recently won a best paper award at ACL 2011 for his work on multilingual syntactic analysis. Slav grew up in Berlin, Germany, but is originally from Sofia, Bulgaria. He therefore considers himself a Berliner from Bulgaria. Whenever Bulgaria plays Germany in soccer, he supports Bulgaria.

March 13, 2012

“Measuring and Using Speech Production Information”   Video Available

Shri Narayanan, Viterbi School of Engineering/University of Southern California

[abstract] [biography]

Abstract

The human speech signal carries crucial information not only about communication intent but also affect, and emotions.  From a basic scientific perspective, understanding how such rich information is encoded in human speech can shed light on the underlying communication mechanisms. From a technological perspective, finding ways for automatically processing and decoding this complex information in speech continues to be of interest for a variety of applications. One line of work in this realm aims to connect these perspectives by creating technological advances to obtain insights about basic speech communication mechanisms and in utilizing direct information about human speech production to inform technology development. Both these engineering problems will be considered in this talk.   A longstanding challenge in speech production research has been the ability to examine real-time changes in the shaping of the vocal tract; a goal that has been furthered by imaging techniques such as ultrasound, movement tracking and magnetic resonance imaging. The spatial and temporal resolution afforded by these techniques, however, has limited the scope of the investigations that could be carried out.    In this talk, we will highlight recent advances that allow us to perform near real-time investigations on the dynamics of vocal tract shaping during speech. We will also use examples from recent and ongoing research to describe some of the methods and outcomes of processing such data, especially toward facilitating lingusitic analysis and modeling, and speech technology development. [Work supported by NIH, ONR, and NSF].

Speaker Biography

Shrikanth (Shri) Narayanan is Andrew J. Viterbi Professor of Engineering at the University of Southern California (USC), where he holds appointments as Professor of Electrical Engineering, Computer Science, Linguistics and Psychology, and as Director of the USC Ming Hsieh Institute. Prior to USC he was with AT&T Bell Labs and AT&T Research. His research focuses on human-centered information processing and communication technologies. He is a Fellow of the Acoustical Society of America, IEEE, and the American Association for the Advancement of Science (AAAS). He is also an Editor for the Computer Speech and Language and an Associate Editor for the IEEE Transactions on Multimedia, IEEE Transactions on Affective Computing, APSIPA Transactions on Signal and Information Processing and the Journal of the Acoustical Society of America. He is a recipient of several honors including Best Paper awards from the IEEE Signal Processing society in 2005 (with Alex Potamianos) and in 2009 (with Chul Min Lee) and selection as a Distinguished Lecturer for the IEEE Signal Processing society for 2010-11. He has published over 475 papers, and has twelve granted US patents.

March 27, 2012

“Linguistic Structure Prediction with AD3”   Video Available

Noah Smith, Carnegie Mellon University

[abstract] [biography]

Abstract

In this talk, I will present AD3 (Alternating Directions Dual Decomposition), an algorithm for approximate MAP inference in loopy graphical models with discrete random variables, including structured prediction problems.  AD3 is simple to implement and well-suited to problems with hard constraints expressed in first-order logic.  It often finds the exact MAP solution, giving a certificate when it does; when it doesn't, it can be embedded within an exact branch and bound technique.  I'll show experimental results on two natural language processing tasks, dependency parsing and frame-semantic parsing.  This work was done in collaboration with Andre Martins, Dipanjan Das, Pedro Aguiar, Mario Figueiredo, and Eric Xing.

Speaker Biography

I am the Finmeccanica Associate Professor of Language Technologies and Machine Learning in the School of Computer Science at Carnegie Mellon University. I received my Ph.D. in Computer Science, as a Hertz Foundation Fellow, from Johns Hopkins University in 2006 and my B.S. in Computer Science and B.A. in Linguistics from the University of Maryland in 2001. My research interests include statistical natural language processing, especially unsupervised methods, machine learning for structured data, and applications of natural language processing. My book, Linguistic Structure Prediction, covers many of these topics. I serve on the editorial board of the journal Computational Linguistics and the Journal of Artificial Intelligence Research and received a best paper award at the ACL 2009 conference. My research group, Noah's ARK, is supported by the NSF (including an NSF CAREER award), DARPA, Qatar NRF, IARPA, ARO, Portugal FCT, and gifts from Google, HP Labs, IBM Research,and Yahoo Research.

March 30, 2012

“Machine Learning in the Loop”

John Langford, Yahoo! Research

[abstract] [biography]

Abstract

The traditional supervised machine learning paradigm is inadequate for a wide array of potential machine learning applications where the learning algorithm decides on an action in the real world and gets feedback about that action. This inadequacy results in kludgy systems, such as for ad targeting at internet companies or deep systemic mistrust and skepticism such as for personalized medicine or adaptive clinical trials. I will discuss a new formal basis, algorithms, and practical tricks for doing machine learning in this setting.

Speaker Biography

John Langford is a computer scientist, working as a senior researcher at  Yahoo! Research. He studied Physics and Computer Science at the  California Institute of Technology, earning a double bachelor's degree  in 1997, and received his Ph.D. from Carnegie Mellon University in 2002. Previously, he was affiliated with the Toyota Technological Institute  and IBM's Watson Research Center. He is also the author of the popular Machine Learning weblog, hunch.net and the principle developer of Vowpal Wabbit.

April 6, 2012

“In The Beginning was the Familiar Voice”   Video Available

Diana Sidtis, New York University

[abstract] [biography]

Abstract

Hearing and sound, compared with vision, are latecomers and second cousins in cultural and scientific history.  Still today, voice scientists are scattered across many disciplines and much of vocal function remains elusive. In modern linguistics, speech sounds have received more attention than voice, and in neuropsychology,  voices have only recently begun to catch up with faces.  Yet vocalization likely played a major role in biological evolution, appearing long before speech, and contributing crucially to survival and social behaviors in numerous species.  Paralinguistic communication by voice, an inborn ability arising from this evolutionary trajectory, has flowered to a prodigious competence in humans.  Voice information is multiplex, signaling affective, attitudinal, linguistic, pragmatic, physiological and psychological characteristics, as well as personal identity. The cues for this long list of characteristics likewise constitute a very large repertory of auditory-acoustic, physiological, perceptual, and speech-like parameters. This many-to-many relationship between characteristics signaled in the voice and the cues to them presents a great challenge to voice research. Further, because of important differences between familiar and unfamiliar voices, the role of the listener is key.  Studies of persons with focal brain damage indicate that perception of unfamiliar and recognition of familiar voices are independent and unordered cerebral abilities.  These and related findings lead to a model of voice perception that posits an interplay between featural analysis and pattern recognition.  From this perspective, the personally familiar voice, viewed as a complex auditory pattern for which idiosyncratic featural attributes arise adventitiously, is preeminent in evolution and in human communication.    

Speaker Biography

Diana Sidtis (formerly Van Lancker) is Professor of Communicative Sciences and Disorders at New York University and performs research at the Nathan Kline Institute for Psychiatric Research. An experienced clinician, her publications include numerous scholarly articles and book chapters.

April 13, 2012

“Text Geolocation and Dating: Light-Weight Language Grounding”   Video Available

Jason Baldridge, University of Texas

[abstract] [biography]

Abstract

It used to be that computational linguists had to collaborate with roboticists in order to work on grounding language in the real world. However, since the advent of the internet, and particularly in the last decade, the world has been brought within digital scope. People's social and business interactions are increasingly mediated through a medium that is dominated by text. They tweet from places, express their opinions openly, give descriptions of photos, and generally reveal a great deal about themselves in doing so, including their location, gender, age, social status, relationships and more. In this talk, I'll discuss work on geolocation and dating of texts, that is, identifying a sets of latitude-longitude pairs and time periods that a document is about or related to. These applications and the models developed for them set the stage for deeper investigations into computational models of word meaning that go beyond standard word vectors and into augmented multi-component representations that include dimensions connected to the real world via geographic and temporal values and beyond.

Speaker Biography

Jason Baldridge is an associate professor in the Department of Linguistics at the University of Texas at Austin. He received his Ph.D. from the University of Edinburgh in 2002, where his doctoral dissertation was awarded the 2003 Beth Dissertation Prize from the European Association for Logic, Language, and Information. His main research interests include categorial grammars, parsing, semi-supervised learning, coreference resolution, and georeferencing. He is one of the co-creators of the Apache OpenNLP Toolkit and has been active for many years in the creation and promotion of open source software for natural language processing.

April 17, 2012

“Factored Adaptation for Separating Speaker and Environment Variability”   Video Available

Mike Seltzer, Microsoft Research

[abstract] [biography]

Abstract

Acoustic model adaptation can reduce the degradation in speech recognition accuracy caused by mismatch between the speech seen at runtime and that seen in training. This mismatch is caused by many factors, including as the speaker and the environment. Standard data-driven adaptation techniques address any and all of these differences blindly. While this is a benefit, it can also be a drawback as its unknown precisely what mismatch is being compensated. This prevents the transforms from being reliably reused across sessions of an application that can be used in different environments such as voice search on a mobile phone. In this talk, I'll discuss our recent research in factored adaptation, which jointly compensates for acoustic mismatch in a manner that enables multiple sources of variability to be separated. By performing adaptation in this way, we can increase the utility of the adaptation data and more effectively reuse transforms across user sessions. The effectiveness of the proposed approach will be shown on a series of experiments on a small vocabulary noisy digits task and a large vocabulary voice search task.

Speaker Biography

Mike Seltzer received the Sc.B. with honors from Brown University in 1996, and M.S. and Ph.D. degrees from Carnegie Mellon University in 2000 and 2003, respectively, all in electrical engineering.  From 1996 to 1998, he was an applications engineer at Teradyne, Inc., Boston, MA working on semiconductor test solutions for mixed-signal devices.  From 1998 to 2003, he was a member of the Robust Speech Recognition group at Carnegie Mellon University. In 2003, Dr. Seltzer joined the Speech Technology Group at Microsoft Research, Redmond, WA. In 2006, Dr. Seltzer was awarded the Best Young Author paper award from the IEEE Signal Processing Society. From 2006 to 2008, he was a member of the Speech & Language Technical Committee (SLTC) and was the Editor-in-Chief of the SLTC e-Newsletter. He was a general co-chair of the 2008 International Workshop on Acoustic Echo and Noise Control and Publicity Chair of the 2008 IEEE Workshop on Spoken Language Technology. He is currently an Associate Editor of the IEEE Transactions on Audio, Speech and Language Processing. His current research interests include speech recognition in adverse acoustical environments, acoustic model adaptation, acoustic modeling, microphone array processing, and machine learning for speech and audio applications.

April 24, 2012

“Not Just for Kids: Enriching Information Retrieval with Reading Level Metadata”

Kevyn Collins-Thompson, Microsoft Research

[abstract] [biography]

Abstract

A document isn't relevant - at least, not immediately -  if you can't understand it, yet search engines have traditionally ignored the problem of finding content at the right level of difficulty as an aspect of relevance.  Moreover, little is currently known about the nature of the Web, its users, and how users interact with content when seen through the lens of reading difficulty.  I'll present our recent research progress in combining reading difficulty prediction with information retrieval, including models, algorithms and large-scale data analysis.   Our results show how the availability of reading level metadata - especially in combination with topic metadata - opens up new and sometimes surprising possibilities for enriching search systems, from personalizing Web search results by reading level to predicting user and site expertise, improving result caption quality, and estimating searcher motivation. This talk includes joint work with Paul N. Bennett, Ryen White, Susan Dumais, Jin Young Kim, Sebastian de la Chica, and David Sontag.

Speaker Biography

Kevyn Collins-Thompson is a Researcher in the Context, Learning and User Experience for Search (CLUES) group at Microsoft Research (Redmond).  His research lies in an area combining information retrieval, machine learning, and computational linguistics, and focuses on models, algorithms, and evaluation methods for making search technology more reliable and effective. His recent work has explored algorithms and Web search applications for reading level prediction; optimization strategies that reduce the risk of applying risky retrieval algorithms like personalization and automatic query rewriting; and educational applications of IR such as intelligent tutoring systems.  Kevyn received his Ph.D. and M.Sc. from the Language Technologies Institute at Carnegie Mellon University and B.Math from the University of Waterloo.

May 4, 2012

“I know that voice: an interactive lecture-demonstration of human assisted speaker recognition”   Video Available

John J. Godfrey and Craig S. Greenberg, Department of Defense/ National Institute of Standards and Technology

[abstract] [biography]

Abstract

As we heard from our recent seminar guest Diana Sidtis, the ability to recognize other human voices, but most especially those of our family and close associates, has deep biological roots and an interesting neurological basis, including a sharp difference between familiar and unfamiliar voices. Computers make no such distinction. While we have made enormous progress in enabling computers to recognize voices, we have not paid much attention to how humans do it. We should – we need to know both the limits and the special capabilities of humans, both to improve our modeling and to enable computers to work hand in hand with humans in practical applications like forensics and biometrics. So how good are humans at utilizing automatic speaker recognition technology for performing speaker verification tasks? Don’t believe what you see on CSI or in the papers! and keep an eye on the case surrounding the tragic death of Trayvon Martin in Florida which is likely to involve such matters. The 2010 NIST Speaker Recognition Evaluation (SRE10) included a test of Human Assisted Speaker Recognition (HASR) in which systems based in whole or in part on human expertise were evaluated on limited sets of trials. Results were submitted for 20 systems from 15 sites from 6 countries. The performance results suggest that the chosen trials were indeed difficult, as is often the case in real- life situations, and that the HASR systems did not appear to perform as well as the best fully automatic systems on these trials. This does not mean that machines are simply, always, everywhere “better” than people at speaker recognition. But what does it mean? This is worth discussing. This lecture-demonstration will provide a live, interactive speaker recognition exercise for the audience, giving everyone a firsthand experience of the task a human forensic examiner often faces. Prepared with such experience, the audience will then hear the highlights of the NIST HASR evaluations.

Speaker Biography

John Godfrey received his PhD in Linguistics from Georgetown University, did a postdoc at AMRL in Dayton, and spent 10 years at UT-Dallas’ Callier Center as an Assist./Assoc. Professor, where he focused on speech perception and psycholinguistics. He later joined Texas Instruments Speech Research Group where, in addition to phonetics research, he worked on corpus-based evaluation, designing and collecting corpora such as: Wall Street Journal, TI-MIT, ATIS, and SWITCHBOARD. It is widely acknowledged that these helped drive speech research for the next decade and more. He also served as the first Executive Director of the LDC, creating the infrastructure for evaluation-based “big data” research in HLT ever since. In 1999 he became Chief of HLT Research at NSA where he oversaw both government and external R&D efforts in Speaker, Language and Speech Recognition, as well as the annual NIST evaluations in these areas. His strategic responsibilities also included liaison with academic and industrial labs, DARPA, IARPA, and NSF. In recent years his research group’s success on classified applications has become widely known and demonstrated in the Intelligence Community. They won the NSA Research Team of the Year award in 2010. As HLT Chief Scientist for NSA Research, he also conducts and oversees research in speaker recognition by man and machine. Craig Greenberg received his B.A.(Hons.) degree in Logic, Information, & Computation from the University of Pennsylvania(2007), and his B.M. degree from Vanderbilt University(2003). He is currently working toward his M.S. degree (to be awarded in May 2012) in Applied Mathematics at Johns Hopkins University in the Engineering and Applied Science Program for Professionals. He works as a Mathematician at the Gaithersburg, Maryland campus of the National Institute of Standards and Technology (NIST) in the areas of speaker recognition and language recognition. Previous positions he has held include: Computer Scientist Intern at the National Institute of Standards and Technology, Research Assistant for Professor Mitch Marcus at the University of Pennsylvania, Programmer at the Linguistic Data Consortium, and English Language Annotator at the Institute for Research in Cognitive Science. Mr. Greenberg has been a member of the International Speech Communication Association (ISCA) since 2008. He has received two official letters of recognition for his contribution to speaker recognition evaluation.

July 3, 2012

“Under Pressure: Transforming the Way We Think About and Use Water in the Home”   Video Available

Jon Froehlich, University of Maryland, College Park

[abstract] [biography]

Abstract

  Cities across the world are facing an escalating demand for potable water and sanitation infrastructure due to growing populations, higher population densities and warmer climates. According to the United Nations, this is one of the most pressing issues of the century. As new sources of water become more environmentally and economically costly to extract, water suppliers and governments are shifting their focus from finding new supplies to using existing supplies more efficiently. One challenge in improving residential efficiency, however, is the lack of awareness that occupants have about their in-home water consumption habits. This disconnect makes it difficult, even for motivated individuals, to make informed decisions about what steps can be taken to conserve.   To help address this problem, my research focuses on creating new types of sensors to monitor and infer everyday human activity such as driving to work or taking a shower, then feeding back this sensed information in novel, engaging, and informative ways with the goal of increasing awareness and promoting environmentally responsible behavior. In this talk, I will present a novel, low-cost, and easy-to-install water sensing system called HydroSense, which infers usage data at the level of individual water fixtures from a single-sensing point and a real-time ambient water usage feedback display called Reflect2O, which leverages HydroSense’s data granularity to inform and promote efficient water usage practices in the home. My talk will emphasize the sensor and inference algorithm development, our two in-home evaluations, and our preliminary evaluations of our feedback visualization designs. Our goal is to reach a 15-20% reduction in water use amongst deployed homes, which, according to the American Water Works Association, would save approximately 2.7 billion gallons per day and more than $2 billion per year.   

Speaker Biography

Jon Froehlich is an Assistant Professor in the Department of Computer Science at the University of Maryland, College Park and a member of the Human-Computer Interaction Laboratory (HCIL) and the Institute for Advanced Computer Studies (UMIACS). His research focuses on building and studying interactive technology that addresses high value social issues such as environmental sustainability, computer accessibility, and personal health and wellness. Jon earned his PhD from the University of Washington (UW) in Computer Science in 2011 with a focus on Human-Computer Interaction (HCI) and Ubiquitous Computing (UbiComp). For his doctoral research, Jon was recognized with the Microsoft Research Graduate Fellowship (2008-2010) and the College of Engineering Graduate Student Research Innovator of the Year Award (2010). His work has been published in many top-tier academic venues including CHI, UbiComp, IJCAI, MobiSys and ICSE and has earned a best paper award and two best paper nominations. Jon received his MS in Information and Computer Science in 2004 from the University of California, Irvine.

July 11, 2012

“Motion Magnification and Motion Denoising”   Video Available

William T. Freeman, Massachusetts Institute of Technology

[abstract] [biography]

Abstract

  I'll present two topics relating to the analysis and re-display of motion: (1) Motion denoising: We'd like to take a video sequence and break it into different components, corresponding to each different physical process observed in the video sequence.  (Then you could modify each component separately and re-combine them).  Here's a first step in that direction: we separate a video sequence into its short-term (motion noise) and longer-term components.  The machinery behind this is an MRF.  Motion is never explicitly computed, allowing to manipulate sequences where occlusion artifacts would otherwise. (2) Motion magnification:  We've developed a new, simple and fast way to magnify and re-render small motions in videos.  This has complementary strengths to the SIGGRAPH 2005 work of Liu et al, and is 100,000 times faster, making it practical for real-time applications, making a real-time motion microscope possible.

Speaker Biography

  William T. Freeman is Professor of Electrical Engineering and Computer Science at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, joining the faculty in 2001. From 1992 - 2001 he worked at Mitsubishi Electric Research Labs (MERL), in Cambridge, MA, most recently as Sr. Research Scientist and Associate Director. He studied computer vision for his PhD in 1992 from the Massachusetts Institute of Technology, and received a BS in physics and MS in electrical engineering from Stanford in 1979, and an MS in applied physics from Cornell in 1981. His current research interests include machine learning applied to computer vision, Bayesian models of visual perception, and computational photography. He received outstanding paper awards at computer vision or machine learning conferences in 1997, 2006 and 2009. Previous research topics include steerable filters and pyramids, the generic viewpoint assumption, color constancy, computer vision for computer games, and bilinear models for separating style and content. He holds 30 patents. From 1981 - 1987, he worked at the Polaroid Corporation . There he co-developed an electronic printer (Polaroid Palette) , and developed algorithms for color image reconstruction which are used in Polaroid's electronic camera . In 1987-88, Dr. Freeman was a Foreign Expert at theTaiyuan University of Technology , P. R. of China. Dr. Freeman was an Associate Editor of IEEE Trans. on Pattern Analysis and Machine Intelligence (IEEE-PAMI), and a member of the IEEE PAMI TC Awards Committee. He is active in the program or organizing committees of Computer Vision and Pattern Recognition (CVPR), the International Conference on Computer Vision (ICCV), Neural Information Processing Systems (NIPS), and SIGGRAPH. He was the program co-chair for ICCV 2005, and will be program co-chair for CVPR 2013.

July 16, 2012

“Clustering Techniques for Phonetic Categories and Their Implications for Phonology”

William Idsardi, University of Maryland

[abstract]

Abstract

I will review some recent work in collaboration with Ewan Dunbar and Brian Dillon on the use of unsupervised clustering techniques to discover vowel categories. The novel and important point of this work is to try to discover categories with predictable variants, i.e. phonemes with their related allophones. We achieve this by finding categories and transforms on the categories rather than first finding a larger set of more detailed categories (phones) and then later grouping the induced categories into more abstract categories (phonemes). A similar approach can be used to cluster "higher-order invariants" for consonants, in this case locus equations. Finally, we will examine some of the implications of this work for other problems in phonology such as speaker variation and incomplete neutralization.  

July 25, 2012

“How Does the Brain Solve Visual Object Recognition”   Video Available

James DiCarlo, McGovern Institute for Brain Research at MIT

[abstract] [biography]

Abstract

Visual object recognition is a fundamental building block of memory and cognition, but remains a central unsolved problem in systems neuroscience, human psychophysics, and computer vision (engineering). The computational crux of visual object recognition is that the recognition system must somehow be robust to tremendous image variation produced by different “views” of each object -- the so-called, “invariance problem.” The primate brain is an example of a powerful recognition system and my laboratory aims to understand and emulate its solution to this problem. A key step in isolating and constraining the brain’s solution is to first find the patterns of neuronal activity and ways to read that neuronal activity that quantitatively express the brain’s answer to visual recognition. To that end, we have previously shown that a part of the primate ventral visual stream (inferior temporal cortex, IT) rapidly and automatically conveys neuronal population rate codes that qualitatively solve the invariance problem for vision. While this is a good start, it only weakly constrains the brain’s solution. Thus, we have recently set the bar higher -- are such codes quantitatively sufficient to explain behavioral performance? In this talk, I will show how primate systems neuroscience combined with human psychophysics reveals that some (but not all) IT population codes are sufficient to explain human performance on invariant object recognition. This stands in stark contrast to all tested codes in earlier visual areas and computer vision codes, which are all insufficient (falsified by experimental data). These results argue that these rapidly and automatically computed IT population codes are common to primate brains, and that they are the direct substrate of object recognition performance. While this progress constrains and frames the kinds of algorithms we should be searching for in the primate brain, it does not directly reveal their key principles of image encoding or the myriad key “details” of that encoding. While this remains an area of active research, I will conclude by outlining how we aim to combine our experimental results in unsupervised learning with novel computer vision technology to guide us toward discovery of the true underlying cortical algorithm.  

Speaker Biography

DiCarlo joined the McGovern Institute in 2002, and is an associate professor in the Department of Brain and Cognitive Sciences. He received his Ph.D. and M.D. from Johns Hopkins University and did postdoctoral work at Baylor College of Medicine. In 1998, he received the Martin and Carol Macht Young Investigator Research Prize from Johns Hopkins University. In 2002, he received an Alfred P. Sloan Research fellowship and a Pew Scholar Award. He received MIT's Surdna Research Foundation Award and its School of Science Prize for Excellence in Undergraduate Teaching in 2005, and he won a Neuroscience Scholar Award from the McKnight Foundation in 2006.

September 4, 2012

“How Geometric Should Our Semantic Models Be?”   Video Available

Katrin Erk, University of Texas

[abstract] [biography]

Abstract

Presentation Slides Vector space models represent the meaning of a word through the contexts in which it has been observed. Each word becomes a point in a high-dimensional space in which the dimensions stand for observed context items. One advantage of these models is that they can be acquired from corpora in an unsupervised fashion. Another advantage is that they can represent word meaning in context flexibly and without recourse to dictionary senses: Each occurrence gets its own point in space; the points for different occurrences may cluster into senses, but they do not have to. Recently, there have been a number of approaches aiming to extend the vector space success story from word representations to the representation of whole sentences. However, they have a lot of technical challenges to meet (apart from the open question of whether all semantics tasks can be reduced to similarity judgments). An alternative is to combine the depth and rigor of logical form with the flexibility of vector space approaches.

Speaker Biography

Katrin Erk is an associate professor in the Department of Linguistics at the University of Texas at Austin. She completed her dissertation on tree description languages and ellipsis at Saarland University in 2002. From 2002 to 2006, she held a researcher position at Saarland University, working on manual and automatic frame-semantic analysis. Her current research focuses on computational models for word meaning and the automatic acquisition of lexical information from text corpora.

September 11, 2012

“Weak and Strong Learning of Context-Free Grammars”   Video Available

Alexander Clark, Royal Holloway University of London

[abstract] [biography]

Abstract

Rapid progress has been made in the last few years in the 'unsupervised' learning of context-free grammars using distributional techniques: a core challenge for theoretical linguistics and NLP. However these techniques are on their own of limited value because they are merely weak results -- we learn a grammar that generates the right strings, but not necessarily a grammar that defines the right structures. In this talk we will look at various ways of moving from weak learning algorithms to strong algorithms that can provably learn also the correct structures. Of course in order to do this we need to define a mathematically precise notion of syntactic structure. We will present a new theoretical approach to this based on considering transformations of grammars through morphisms of algebraic structures that interpret grammars. Under this model we can say that the simplest/smallest grammar for a language will always use a certain set of syntactic categories, and a certain set of lexical categories; these categories will be drawn from the syntactic concept lattice, a basis for several weak learning algorithms for CFGs. This means that under mild Bayesian assumptions we can consider only grammars that use these categories; this leads to some nontrivial predictions about the nature of syntactic structure in natural languages.

Speaker Biography

Alexander Clark is in the Department of Computer Science at Royal Holloway, University of London. His research interests are in grammatical inference, theoretical and mathematical linguistics and unsupervised learning. He is currently president of SIGNLL and chair of the steering committee of the ICGI; a book coauthored with Shalom Lappin, 'Linguistic Nativism and the Poverty of the Stimulus' was published by Wiley-Blackwell in 2011.

September 14, 2012

“OUCH (Outing Unfortunate Characteristics of HiddenMarkovModels) or What's Wrong with Speech Recognition and What Can We Do About it?”   Video Available

Jordan Cohen, Spelamode

[abstract] [biography]

Abstract

Speech recognition has become a critical part of the user interface in mobile, telephone, and other technology applications. However, current recognition systems consistently underperform their users' and designers' expectations. This talk reports on a project, OUCH, which investigates one aspect of the most commonly used speech recognition algorithms. In most Hidden Markov Model implementations, frame-to-frame independence is assumed by the model, but in fact the frame observations are not independent. This mismatch between the model assumptions and the data have been well known. Following work of Gillick and Wegmann, the OUCH project is measuring and cataloging some of the implications of these assumptions, using a procedure which does not fix the model, but rather which creates speech data which satisfies the model assumptions. (See Don't Multiply Lightly: Quantifying Problems with the Acoustic Model Assumptions in Speech RecognitionDan Gillick, Larry Gillick, and Steven Wegmann, ASRU, 2011)In addition to our work in modeling, we are surveying the field using a snowball technique to document how the researchers and engineers in speech and language technology view the current situation. This talk with review our modeling findings to date, and will offer a preliminary look at our survey.

Speaker Biography

Jordan Cohen is a group leader in the OUCH project at Berkeley, and founder and technologist at Spelamode Consulting. He was the principal investigator for GALE at SRI, the CTO of Voice Signal Technologies, the Director of Business Relations at Dragon, and a member of the research staff at IDA and IBM. Dr. Cohen assists companies with technology issues, and he is engaged in intellectual property evaluation and litigation.

September 28, 2012

“Constrained Conditional Models: Integer Linear Programming Formulations for Natural Language Understanding”   Video Available

Dan Roth, University of Illinois at Urbana-Champaign

[abstract] [biography]

Abstract

Computational approaches to problems in Natural Language Understanding and Information Access and Extraction often involve assigning values to sets of interdependent variables.  Examples of tasks of interest include semantic role labeling (analyzing natural language text at the level of “who did what to whom, when and where”), syntactic parsing, information extraction (identifying events, entities and relations), transliteration of names, and textual entailment (determining whether one utterance is a likely consequence of another).  Over the last few years, one of the most successful approaches to studying these problems involves Constrained Conditional Models (CCMs), an Integer Learning Programming formulation that augments probabilistic models with declarative constraints as a way to support such decisions.   I will present research within this framework, discussing old and new results pertaining to inference issues, learning algorithms for training these global models, and the interaction between learning and inference.

Speaker Biography

Dan Roth is a Professor in the Department of Computer Science and the Beckman Institute at the University of Illinois at Urbana-Champaign and a University of Illinois Scholar. He is the director of a DHS Center for Multimodal Information Access & Synthesis (MIAS) and holds faculty positions in Statistics, Linguistics and at the School of Library and Information Sciences. Roth is a Fellow of the ACM and of AAAI for his contributions to Machine Learning and to Natural Language Processing. He has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely by the research community. Prof. Roth has given keynote talks in major conferences, including AAAI, EMNLP and ECML and presented several tutorials in universities and major conferences. Roth was the program chair of AAAI’11, ACL’03 and CoNLL'02, has been on the editorial board of several journals in his research areas and has won several teaching and paper awards.  Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D in Computer Science from Harvard University in 1995.  

October 2, 2012

“Making Computers Good Listeners”   Video Available

Joseph Keshet, TTI Chicago

[abstract] [biography]

Abstract

A typical problem in speech and language processing has a very large number of training examples, is sequential, highly structured, and has a unique measure of performance, such as the word error rate in speech recognition, or the BLEU score in machine translation. The simple binary classification problem typically explored in machine learning is no longer adequate for the complex decision problems encountered in speech and language applications. Binary classifiers cannot handle the sequential nature of these problems, and are designed to minimize the zero-one loss, i.e., correct or incorrect, rather than the desired measure of performance.In addition, the current state-of-the-art models in speech and language processing are generative models that capture some temporal dependencies, such as Hidden Markov Models (HMMs). While such models have been immensely important in the development of accurate large-scale speech processing applications, and in speech recognition in particular, theoretical and experimental evidence have led to a wide-spread belief that such models have nearly reached a performance ceiling.In this talk, I first present a new theorem stating that a general learning update rule directly corresponds to the gradient of the desired measure of performance. I present a new algorithm for phoneme-to-speech alignment based on this update rule, which surpasses all previously reported results on a standard benchmark. I show a generalization of the theorem to training non-linear models such as HMMs, and present empirical results on phoneme recognition task which surpass results from HMMs trained with all other training techniques.I will then present the problem of automatic voice onset time (VOT) measurement, one of the most important variables measured in phonetic research and medical speech analysis. I will present a learning algorithm for VOT measurement which outperforms previous work and performs near human inter-judge reliability. I will discuss the algorithm’s implications for tele-monitoring of Parkinson’s disease, and for predicting the effectiveness of chemo-radiotherapy treatment of head and neck cancer.

Speaker Biography

Joseph Keshet received his B.Sc. and M.Sc. degrees in Electrical Engineering in 1994 and 2002, respectively, from Tel Aviv University. He received his Ph.D. in Computer Science from The School of Computer Science and Engineering at The Hebrew University of Jerusalem in 2007. From 1995 to 2002 he was a researcher at IDF, and won the prestigious Israeli award, "Israel Defense Prize", for outstanding research and development achievements. From 2007 to 2009 he was a post-doctoral researcher at IDIAP Research Institute in Switzerland. From 2009 He is a research assistant professor at TTI-Chicago, a philanthropically endowed academic computer science institute within the campus of university of Chicago. Dr. Keshet's research interests are in speech and language processing and machine learning. His current research focuses on the design, analysis and implementation of machine learning algorithms for the domain of speech and language processing.

October 9, 2012

“Beyond MaltParser - Recent Advances in Transition-Based Dependency Parsing”   Video Available

Joakim Nivre, Uppsala University

[abstract] [biography]

Abstract

The transition-based approach to dependency parsing has become popular thanks to its simplicity and efficiency. Systems like MaltParser achieve linear-time parsing with projective dependency trees using locally trained classifiers to predict the next parsing action and greedy best-first search to retrieve the optimal parse tree, assuming that the input sentence has been morphologically disambiguated using a part-of-speech tagger. In this talk, I survey recent developments in transition-based dependency parsing that address some of the limitations of the basic transition-based approach. First, I show how globally trained classifiers and beam search can be used to mitigate error propagation and enable richer feature representations. Secondly, I discuss different methods for extending the coverage to non-projective trees, which are required for linguistic adequacy in many languages. Finally, I present a model for joint tagging and parsing that leads to improvements in both tagging and parsing accuracy as compared to the standard pipeline approach.

Speaker Biography

Joakim Nivre is Professor of Computational Linguistics at Uppsala University. He holds a Ph.D. in General Linguistics from the University of Gothenburg and a Ph.D. in Computer Science from Växjö University. Joakim's research focuses on data-driven methods for natural language processing, in particular for syntactic and semantic analysis. He is one of the main developers of the transition-based approach to syntactic dependency parsing, described in his 2006 book Inductive Dependency Parsing and implemented in the MaltParser system. Joakim's current research interests include the analysis of mildly non-projective dependency structures, the integration of morphological and syntactic processing for richly inflected languages, and methods for cross-framework parser evaluation. He has produced over 150 scientific publications, including 3 books, and has given nearly 70 invited talks at conferences and institutions around the world. He is the current secretary of the European Chapter of the Association for Computational Linguistics.

October 23, 2012

“New Waves of Innovation in Large-Scale Speech Technology Ignited by Deep Learning”

Li Deng, Microsoft Research

[abstract] [biography]

Abstract

Semantic information embedded in the speech signal manifests itself in a dynamic process rooted in the deep linguistic hierarchy as an intrinsic part of the human cognitive system. Modeling both the dynamic process and the deep structure for advancing speech technology has been an active pursuit for over more than 20 years, but it is only within past two years that technological breakthrough has been created by a methodology commonly referred to as "deep learning". Deep Belief Net (DBN) and the related deep neural nets are recently being used to supersede the Gaussian mixture model component in HMM-based speech recognition, and has produced dramatic error rate reduction in both phone recognition and large vocabulary speech recognition of industry scale while keeping the HMM component intact. On the other hand, the (constrained) Dynamic Bayesian Networks have been developed for many years to improve the dynamic models of speech aimed to overcome the IID assumption as a key weakness of the HMM, with a set of techniques commonly known as hidden dynamic/trajectory models or articulatory-like segmental representations. A history of these two largely separate lines of research will be critically reviewed and analyzed in the context of modeling the deep and dynamic linguistic hierarchy for advancing speech recognition technology. The first wave of innovation has successfully unseated Gaussian mixture model and MFCC-like features --- two of the three main pillars of the 20-year-old technology in speech recognition. Future directions will be discussed and analyzed on supplanting the final pillar --- HMM --- where frame-level scores are to be enhanced to dynamic-segment scores through new waves of innovation capitalizing on multiple lines of research that has enriched our knowledge of the deep, dynamic process of human speech.

Speaker Biography

Li Deng received the Ph.D. from Univ. Wisconsin-Madison. He was an Assistant (1989-1992), Associate (1992-1996), and Full Professor (1996-1999) at the University of Waterloo, Ontario, Canada. He then joined Microsoft Research, Redmond, where he is currently a Principal Researcher and where he received Microsoft Research Technology Transfer, Goldstar, and Achievement Awards. Prior to MSR, he also worked or taught at Massachusetts Institute of Technology, ATR Interpreting Telecom. Research Lab. (Kyoto, Japan), and HKUST. He has published over 300 refereed papers in leading journals/conferences and 3 books covering broad areas of human language technology and machine learning. He is a Fellow of the Acoustical Society of America, a Fellow of the IEEE, and a Fellow of the International Speech Communication Association. He is an inventor or co-inventor of over 50 granted US, Japanese, or international patents. Recently, he served as Editor-in-Chief for IEEE Signal Processing Magazine (2009-2011), which ranked first in year 2010 and 2011 among all 247 publications within the Electrical and Electronics Engineering Category worldwide in terms of its impact factor, and for which he received the 2011 IEEE SPS Meritorious Service Award. He currently serves as Editor-in-Chief for IEEE Transactions on Audio, Speech and Language Processing. His technical work over the past three years brought the power of deep learning into the speech recognition and signal processing fields.

November 6, 2012

“New Machine Learning Tools for Structured Prediction”   Video Available

Veselin Stoyanov, Johns Hopkins HLTCOE

[abstract] [biography]

Abstract

I am motivated by structured prediction problems in NLP and social network analysis. Markov Random Fields (MRFs) and other Probabilistic Graphical Models (PGMs) are suitable for representing structured prediction: they can model joint distributions and utilize standard inference procedures. MRFs also provide a principled ways for incorporating background knowledge and combining multiple systems. Two properties of structured prediction problems make learning challenging. First, structured prediction almost inevitably requires approximation to inference, decoding or model structure. Second, unlike the traditional ML setting that assumes i.i.d. training and test data, structured learning problems often consist of a single example used both for training and prediction. We address the two issues above. First, we argue that the presence of approximations in MRF-based systems requires a novel perspective on training. Instead of maximizing data likelihood, one should seek the parameters that minimize the empirical risk of the entire imperfect system. We show how to locally optimize this risk using error back-propagation and local optimization. On four NLP problems our approach significantly reduces loss on test data compared to choosing approximate MAP parameters. Second, we utilize data imputation in the limited data setting. At test time we use sampling to impute data that is a more accurate approximation of the data distribution. We use our risk minimization techniques to train fast discriminative models on the imputed data. This we can: (i) train discriminative models given a single training and test example; (ii) train generative/discriminative hybrids that can incorporate useful priors and learn from semi-supervised data.

Speaker Biography

Veselin Stoyanov is a postdoctoral researcher at the Human Language Technology Center of Excellence (HLT-COE) at Johns Hopkins University (JHU). Previously he spent two years working with Prof. Jason Eisner at JHU's Center for Language and Speech Processing supported by a Computing Innovation Postdoctoral Fellowship. He received the Ph.D. degree from Cornell University under the supervision of Prof. Claire Cardie in 2009 and the Honors B.Sc. from the University of Delaware in 2002. His research interests reside in the intersection of Machine Learning and Computational Linguistics. More precisely, he is interested in using probabilistic models for complex structured problems with applications to knowledge base population, modeling social networks, extracting information from text and coreference resolution. In addition to the CIFellowship, Ves Stoyanov is the recipient of an NSF Graduate Research Fellowship and other academic honors.

November 13, 2012

“From Bases to Exemplars, and From Separation to Understanding”   Video Available

Paris Smaragdis, University of Illinois at Urbana-Champaign

[abstract] [biography]

Abstract

Audio source separation is an extremely useful process but most of the time not a goal by itself. Even though most research focuses on better separation quality, ultimately separation is needed so that we can perform tasks such as noisy speech recognition, music analysis, single-source editing, etc.  In this talk I'll present some recent work on audio source separation that extends the idea of basis functions to that of using 'exemplars' and then builds off that idea in order to provide direct computation of some of the above goals without having to resort to an intermediate separation step. In order to do so I'll discuss some of the interesting geometric properties of mixed audio signals and how one can employ massively large decommissions with aggressive sparsity settings in order to achieve the above results.

Speaker Biography

Paris Smaragdis is faculty in the Computer Science and the Electrical and Computer Science departments at the University of illinois at Urbana-Champaign. He completed his graduate and postdoctoral studies at MIT, where he conducted research on computational perception and audio processing. Prior to the University of Illinois he was a senior research scientist at Adobe Systems and a research scientist at Mitsubishi Electric Research Labs, during which time he was selected by the MIT Technology Review as one of the top 35 young innovators of 2006. Paris' research interests lie in the intersection of machine learning and signal processing, especially as they apply to audio problems.

November 20, 2012

“Advances in Deterministic Dependency Parsing”   Video Available

Yoav Goldberg, Google Research

[abstract] [biography]

Abstract

Transition-based dependency parsers are fast, surprisingly accurate and easy to implement. However, many formal aspects of these parsing systems are not well understood. Specifically, little can be said about the effect of individual parsing decisions on the global parse structure. We help bridge this gap by introducing a property which holds for many transition systems (including the popular arc-eager system) and allows us to reason about the global effects of individual parsing actions in these systems. This kind of reasoning paves the path to many interesting applications.I will describe two immediate applications: (1) a novel arc-constrained decoding algorithm ("find a tree that includes the following edges") for transition-based parsers, and (2) a novel oracle which can return a *set* of optimal actions for *any* (configuration,gold-tree) pair, in sharp contrast to traditional oracles that return a single, static sequence of transitions. Thenew oracles allows for a better training procedure which teaches the parser to respond optimally to non-optimal configurations and helps in mitigating error-propagation mistakes. The new oracle and training procedure produce greedy parsers that greatly outperform parsers trained with the traditional, static oracles on a wide range of datasets.This is a joint work with Joakim Nivre.

Speaker Biography

Yoav Goldberg is a post-doctoral researcher at Google Research NY, working primarily on syntactic parsing and its applications. Prior to that, he completed his PhD in Ben Gurion University, where he worked with Prof. Michael Elhadad on automatic processing of Modern Hebrew, a specimen of a morphologically rich language. He spent a summer at USC/ISI working on Machine Translation with Kevin Knight, David Chiang and Liang Huang. Coming February, Yoav will leave Google to assume a Tenure-track senior-lecturer position ("assistant professorship") in Bar Ilan University's Computer Science Department.

November 27, 2012

“Bridging the Gap: From Sounds to Words”

Micha Elsner, Ohio State University

[abstract] [biography]

Abstract

During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary inpronunciation-- for instance "you" might be realized as 'you' with a full vowel or reduced to 'yeh' with a schwa. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. I will present ongoing research on constructing a Bayesian model which can simultaneously group together phonetic variants of the same lexical item, learn a probabilistic language model predicting the next word in an utterance from its context, and learn a model of pronunciation variability based on articulatory features.I will discuss a model which takes word boundaries as given and focuses on clustering the lexical items (published at ACL 2012). I will also give preliminary results for a model which searches for word boundaries at the same time as performing the clustering.

Speaker Biography

Micha Elsner is an Assistant Professor of Linguistics at the Ohio State University, where he started in August. He completed his PhD in 2011 at Brown University, working on models of local coherence. He then worked on Bayesian models of language acquisition as a postdoctoral researcher at the University of Edinburgh.

December 7, 2012

“Probablistic Linear Discriminant Analysis of i--Vector Posterior Distributions”   Video Available

Sandro Cumani, Brno University of Technology

[abstract]

Abstract

The i--vector extraction process is characterized by an intrinsic uncertainty represented by the i--vector posterior covariance.  The usual PLDA models, however, ignore such uncertainty and perform speaker inference based only on point estimates of the i--vector distributions. We therefore propose a new PLDA model which takes into account the i--vector uncertainty.  Since utterance length is the main factor affecting i--vector covariances, we designed a set of experiments to compare the proposed model and the classical PLDA model over segments with short and missmatching durations. The results show that the proposed model allows to improve the accuracy on short segments while retainig the accuracy of the original PLDA over long utterances.

December 7, 2012

“Patrol Team Speaker Identification System for DARPA RATS Evaluation”   Video Available

Oldrich Plchot, Brno University of Technology

[abstract]

Abstract

I will descrine the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. I will describe the general architecture of the system and I will address the issues we are facing in the RATS project. We will also discuss the strategy for the next evaluation and areas where the system can be improved.

Back to Top