Archived Seminars by Year

Show all Seminars     Only show seminars with video

2011

February 2, 2011

“Language Processing in the Web Era”

Kuansan Wang, Microsoft

[abstract] [biography]

Abstract

Natural language processing (NLP) has been dominated by statistical based data driven approaches. The massive amount of data available, especially those from the Web, have further fueled the progress in this area. In the past decades, it has been widely reported that simple methods can often outperform most complicated system when trained with large amount of data. In deploying many web scale applications, however, we regularly find that the size of training data is just one of several factors that contribute to the success of the applications. In this talk, we will use real world applications to illustrate the important design considerations in web scale NLP: (1) rudimentary multilingual capabilities to cope with the global nature of the web, (2) versatile modeling of the diverse styles of languages used in the web documents, (3) fast adaptation to keep pace with the changes of the web, (4) few heuristics to ensure system generalizability and robustness, and (5) possibilities for efficient implementations with minimal manual efforts.

Speaker Biography

Dr. Kuansan Wang is a Principal Researcher at Microsoft Research, Redmond WA, where he is currently managing Human Intelligence Technology Group in Internet Service Research Center. He joined Microsoft Research in 1998 with Speech Technology Group, conducting research in spoken language understanding and dialog system. He was responsible for architecting many speech products from Microsoft, ranging from desktop, embedded and server applications to mobile and cloud based services. His research outcomes, disclosed in more than 60 US and European patents and applications, have been adopted in three ISO, three W3C and four ECMA standards. He has also served as an organizing member/reviewer and panelist at WWW, ICASSP, InterSpeech, ACL and various workshops in speech, language and web research areas. Dr. Wang received B.S. from National Taiwan University, M.S. and PhD from University of Maryland, College Park, all in Electrical Engineering. Prior to joining Microsoft, he was a Member of Technical Staff in AT&T/Lucent Bell Labs in Murray Hill, NJ, and NYNEX/Verizon Science and Technology Center in White Plain, NY.  

February 8, 2011

“A Scalable Distributed Syntactic, Semantic and Lexical Language Model”

Shaojun Wang, Wright State University

[abstract] [biography]

Abstract

In this talk, I'll present an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating n-gram, structured language model and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and ``readability'' of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.

Speaker Biography

Shaojun Wang received his B.S. and M.S. in Electrical Engineering at Tsinghua University in 1988 and 1992 respectively, M.S. in Mathematics and Ph.D. in Electrical Engineering at the University of Illinois at Urbana-Champaign in 1998 and 2001 respectively. From 2001 to 2005, he worked at CMU, Waterloo and University of Alberta as a post-doctoral fellow. He joined the Department of Computer Science and Engineering at Wright State University as an assistant professor in 2006. His research interest is statistical machine learning, natural language processing, and cloud computing. He is now mainly focusing on two projects: large scale distributed language modeling and semi-supervised discriminative structured prediction, that are funded by NSF, Google and AFOSR. Both emphasize on scalability and parallel/distributed approaches to process extremely large scale datasets.    

February 15, 2011

“A Brief History of the Penn Treebank”

Mitch Marcus, University of Pennsylvania

[abstract] [biography]

Abstract

The Penn Treebank, initially released in 1992, was the first richly annotated text corpus widely available within the natural language processing (NLP) community. Its release led within a few years to the development of the first competent English parsers, and helped spark the statistical revolution within NLP. Over the past 20 years, the Penn Treebank has become the de facto standard for training and test English parsers, and still plays this role nearly 2 decades after its release. This talk will briefly describe the Penn Treebank and its applications, then discuss the history of the Treebank's development from Fred Jelinek's first proposal of a treebank to DARPA in 1987 through our development of the Treebank from 1989 until the release of Treebank II in 1995. I will attempt to explain the Penn Treebank's motivations and the process of creating it, perhaps explaining why it has some of its more peculiar properties. This talk describes joint work with Beatrice Santorini, Mary Ann Marcinkiewicz, Grace Kim, Ann Bies, and many others.  

Speaker Biography

Mitchell Marcus is the RCA Professor of Artificial Intelligence in the Department of Computer and Information Science at the University of Pennsylvania. He was the principal investigator for the Penn Treebank Project through the mid-1990s; he and his collaborators continue to develop hand-annotated corpora for use world-wide as training materials for statistical natural language systems. Other research interests include: statistical natural language processing, human-robot communication, and cognitively plausible models for automatic acquisition of linguistic structure. He has served as chair of Penn's Computer and Information Science Department, as chair of the Penn Faculty Senate, and as president of the Association for Computational Linguistics. He is also a Fellow of the American Association of Artificial Intelligence. He currently serves as chair of the Advisory Committee of the Center of Excellence in Human Language Technology at JHU, as well as serving as a member of the advisory committee for the Department of Computer and Information Science.

February 22, 2011

“Entrainment to the Other in Conversational Speech”

Julia Hirschberg, Columbia University

[abstract] [biography]

Abstract

When people engage in conversation, they adapt the way they speak to the speaking style of their conversational partner in a variety of ways. For example, they may adopt a certain way of describing something based upon the way their conversational partner describes it, or adapt their pitch range or speaking rate to a conversational partner's. They may even align their turn-taking style or use of cue phrases to match their partner's. These types of entrainment have been shown to correlate with various measures of task success and dialogue naturalness. While there is considerable evidence for lexical entrainment from laboratory experiments, less is known about other types of acoustic-prosodic and discourse-level entrainment and little work has been done to examine entrainments in multiple modalities for the same dialogue. I will discuss research in entrainment in multiple dimensions on the Columbia Games Corpus and the Switchboard Corpus. Our goal is to understand how the different varieties of entrainment correlate with one another and to determine which types of entrainment will be both useful and feasible to model in Spoken Dialogue Systems. (This is joint research with Rivka Levitan and Erica Cooper, Columbia University; Agustin Gravano, University of Buenos Aires; Ani Nenkova, University of Pennsylvania; Stefan Benus, Constantine the Philosopher University; and Jens Edlund and Mattias Heldner, KTH.)

Speaker Biography

Julia Hirschberg is a professor in the Department of Computer Science at Columbia University. She received her PhD in Computer Science from the University of Pennsylvania, after previously doing a PhD in sixteenth-century Mexican social history at the University of Michigan and teaching history at Smith. She worked at Bell Laboratories and AT&T Laboratories -- Research from 1985-2003 as a Member of Technical Staff and a Department Head, creating the Human-Computer Interface Research Department there. She has also served as editor-in-chief of Computational Linguisticsfrom 1993-2003 and was an editor-in-chief of Speech Communication from 2003-2006 and is now on the Editorial Board. Julia was on the Executive Board of the Association for Computational Linguistics (ACL) from 1993-2003, has been on the Permanent Council of International Conference on Spoken Language Processing (ICSLP) since 1996, and served on the board of the International Speech Communication Association (ISCA) from 1999-2007 (as President 2005-2007). She is on the board of the CRA-W and has been active in working for diversity at AT&T and at Columbia. Julia has also been a fellow of the American Association for Artificial Intelligence since 1994 and an ISCA Fellow since 2008. She received a Columbia Engineering School Alumni Association (CESAA) Distinguished Faculty Teaching Award in 2009.

March 8, 2011

“Query-focused Summarization Using Text-to-Text Generation: When Information Comes from Multilingual Sources”

Kathy McKeown, Columbia University

[abstract] [biography]

Abstract

The past five years have seen the emergence of robust, scalable natural language processing systems that can summarize and answer questions about online material. One key to the success of such systems is that they re-use text that appeared in the documents rather than generating new sentences from scratch. Re-using text is absolutely essential for the development of robust systems; full semantic interpretation of unrestricted text is beyond the state of the art. Better summaries and answers can be produced, however, if systems can generate new sentences from the input text, fusing relevant phrases and discarding irrelevant ones. When the underlying sources for summarization come from multiple languages, the need for text-to-text generation is even more pronounced. In this talk I first present the concept of text-to-text generation, showing the different kinds of editing that an be done. I then show how it has been used in our research on summarization and open-ended question-answering. Because our sources include informal genres as well as formal genres and draw from English, Arabic and Chinese, editing is critical for improving the intelligibility of responses. In our systems, we exploit information available at question answering time to edit sentences, removing redundant and irrelevant information and correcting errors in translated sentences. We also present new work on machine translation which uses information from multiple systems to post-edit the translations, again using text-to-text generation but within a TAG formalism.  

Speaker Biography

Kathleen R. McKeown is the Henry and Gertrude Rothschild Professor of Computer Science at Columbia University. She served as Department Chair from 1998-2003. Her research interests include text summarization, natural language generation, multi-media explanation, digital libraries, concept to speech generation and natural language interfaces. McKeown received the Ph.D. in Computer Science from the University of Pennsylvania in 1982 and has been at Columbia since then. In 1985 she received a National Science Foundation Presidential Young Investigator Award, in 1991 she received a National Science Foundation Faculty Award for Women, in 1994 was selected as a AAAI Fellow, and in 2003 was elected as an ACM Fellow. McKeown is also quite active nationally. She serves as a board member of the Computing Research Association and serves as secretary of the board. She served as President of the Association of Computational Linguistics in 1992, Vice President in 1991, and Secretary Treasurer for 1995-1997. She has served on the Executive Council of the Association for Artificial Intelligence and was co-program chair of their annual conference in 1991.

March 18, 2011

“Enhancing ESL Education in India with a Reading Tutor that Listens”

Kalika Bali, Microsoft

[abstract] [biography]

Abstract

In this talk, I will talk about10 week pilot on CMU Project Listen's PC-based Reading Tutor program for enhancing English education in India. The pilot focused on low-income elementary school students, a population that has little or no exposure to English outside of school. The students showed measurable improvement on quantitative tests of reading fluency while using the tutor. Post-pilot interviews explored the students' experience of the reading tutor. I would be discussing both technical and non-technical factors that might effect the success of such a speech-technology based tutor for this demographics in India.

Speaker Biography

Kalika Bali is a researcher with the Multilingual Systems group at Microsoft Research Labs India (Bangalore). Her primary research interests are in Speech Technology and Computational Linguistics, especially for Indian Languages. A linguist by training, she has taught at the University of the South Pacific as an Assoc. Prof. She has worked in the area of research and development of Language Technology at both start-ups and established companies like Nuance, Simputer, Hewlett-Packard Labs and Microsoft Research. She has also been actively involved in development of Standards related to language technologies. Her current focus is on the use fo language technologies for Education.

March 29, 2011

“Towards a Theory of Collective Social Computation: Connecting Individual Decision-making rules to Collective Patterns through Adaptive Causal Circuit Construction”

Jessica Flack, Santa Fe Institute

[abstract] [biography]

Abstract

I will discuss empirical and computational approaches my collaborators and I have been developing to build adaptive causal circuits that connect individual decision-making rules to collective patterns. This approach requires techniques that permit extraction of decision-making rules from time-series data. A goal of the research I will be discussing is to give an empirically grounded computational account of the emergence of robust aggregate features and hierarchical organization in social evolution.

Speaker Biography

Jessica Flack is Professor at the Santa Fe Institute and Co-Director of the Collective Social Computation Group. Her research program combines dynamical systems and computational perspectives in order to build a theory of how aggregate structure and hierarchy arise in social evolution. Primary goals are to understand the conditions and mechanisms supporting the emergence of slowly changing collective features that feed-down to influence component behavior, the role that conflict plays in this process, and the implications of multiple timescales and overlapping networks for robustness and adaptability in social evolution. Research foci include design principles for robust systems, conflict dynamics and control, the role of uncertainty reduction in the evolution of signaling systems, the implications of higher-order structures for social complexity and innovation, behavioral grammars and adaptive circuit construction. Flack approaches these issues using data on social process collected from animal society model systems, and through comparison of social dynamics with neural, immune, and developmental dynamics. Flack received her PhD in 2004 from Emory University in evolution, cognition and animal behavior. Flack was a Postdoctoral Fellow at SFI before joining the SFI Faculty in 2007.

April 5, 2011

“Statistical Topic Models for Computational Social Science”

Hanna Wallach, University of Massachusetts Amherst

[abstract] [biography]

Abstract

In order to draw data-driven conclusions, social scientists need quantitative tools for analyzing massive, complex collections of textual information. I will discuss the development of such tools. I will concentrate on a class of models known as statistical topic models, which automatically infer groups of semantically-related words (topics) from word co-occurrence patterns in documents, without requiring human intervention. The resultant topics can be used to answer a diverse range of research questions, including detecting and characterizing emergent behaviors, identifying topic-based communities, and tracking trends across languages. The foundation of statistical topic modeling is Bayesian statistics, which requires that assumptions, or prior beliefs, are made explicit. Until recently, most statistical topic models relied on two unchallenged prior beliefs. In this talk, I will explain how challenging these beliefs increases robustness to the skewed word frequency distributions common in text. I will also talk about recent work (with Rachel Shorey and Bruce Desmarais) on statistical topic models for studying temporal and textual patterns in formerly-classified government documents.

Speaker Biography

Hanna Wallach is an assistant professor in the Department of Computer Science at the University of Massachusetts Amherst. She is one of five core faculty members involved in UMass's newly-formed computational social science research initiative. Previously, Hanna was a postdoctoral researcher, also at UMass, where she developed Bayesian latent variable models for analyzing complex data regarding communication and collaboration within scientific and technological communities. Her recent work (with Ryan Adams and Zoubin Ghahramani) on infinite belief networks won the best paper award at AISTATS 2010. Hanna has co-organized multiple workshops on both computational social science and Bayesian latent variable modeling. Her tutorial on conditional random fields is widely referenced and used in machine learning courses around the world. As well as her research, Hanna works to promote and support women's involvement in computing. In 2006, she co-founded the annual workshop for women in machine learning, in order to give female faculty, research scientists, postdoctoral researchers, and graduate students an opportunity to meet, exchange research ideas, and build mentoring and networking relationships. In her not-so-spare time, Hanna is a member of Pioneer Valley Roller Derby, where she is better known as Logistic Aggression.

April 12, 2011

“Information visualization and its application to machine translation”

Rebecca Hwa, University of Pittsburgh

[abstract] [biography]

Abstract

In this talk, I will present an interactive interface that helps users to explore and understand imperfect outputs from automatic machine translation (MT) systems. The target users of our system are people who do not understand the original (source) language. Through a visualization of multiple linguistic resources, our system enables users to identify potential translation mistakes and make educated guesses as to how to correct them. Experimental results suggest that users of our prototype are able to correct some difficult translation errors that they would have found baffling otherwise. The experiments further suggest adaptive methods to improve standard phrase-based machine translation systems.

Speaker Biography

Rebecca Hwa is an Associate Professor in the Department of Computer Science at the University of Pittsburgh. Before joining Pitt, she was a postdoc at University of Maryland. She received her PhD in Computer Science from Harvard University in 2001 and her B.S. in Computer Science and Engineering from UCLA in 1993. Dr. Hwa's primary research interests include multilingual processing, machine translation, and semi-supervised learning methods. Additionally, she has collaborated with colleagues on information visualization, sentiment analysis, and bioinformatics. She is a recipient of the NSF CAREER Award. Her work has also been supported by NIH and DARPA. Dr. Hwa currently serves as the chair of the executive board of the North American Chapter of the Association for Computational Linguistics.

April 19, 2011

“Integrating history-length interpolation and classes in language modeling”

Hinrich Schuetze, University of Stuttgart

[abstract] [biography]

Abstract

Building on earlier work that integrates different factors in language modeling, we view (i) backing off to a shorter history and (ii) class-based generalization as two complementary mechanisms of using a larger equivalence class for prediction when the default equivalence class is too small for reliable estimation. This view entails that the classes in a language model should be learned from rare events only and should be preferably applied to rare events. We construct such a model and show that both training on rare events and preferable application to rare events improve perplexity when compared to a simple direct interpolation of class-based with standard language models.

Speaker Biography

Hinrich Schuetze is a professor of computational linguistics in the School of Computer Science and Electrical Engineering at the Unversity of Stuttgart in Germany. He received his PhD in linguistics from Stanford University in 1995 and worked in the areas of text mining and information retrieval at a number of research institutions and startups in Silicon Valley until 2004. His research focuses on natural language processing problems that are important for applications like information retrieval and machine translation and at the same time contribute to our fundamental understanding of language as a cognitive phenomenon. He is a coauthor of Foundations of Statistical Natural Language Processing (MIT Press, with Chris Manning) and Introduction to Information Retrieval (Cambridge University Press, with Chris Manning and Prabhakar Raghavan).

April 26, 2011

“Building Watson: An Overview of DeepQA for the Jeopardy! Challenge”

David Ferrucci, IBM

[abstract] [biography]

Abstract

Computer systems that can directly and accurately answer peoples' questions over a broad domain of human knowledge have been envisioned by scientists and writers since the advent of computers themselves. Open domain question answering holds tremendous promise for facilitating informed decision making over vast volumes of natural language content. Applications in business intelligence, healthcare, customer support, enterprise knowledge management, social computing, science and government would all benefit from deep language processing. The DeepQA project is aimed at exploring how advancing and integrating Natural Language Processing (NLP), Information Retrieval (IR), Machine Learning (ML), massively parallel computation and Knowledge Representation and Reasoning (KR&R) can greatly advance open-domain automatic Question Answering. An exciting proof-point in this challenge is to develop a computer system that can successfully compete against top human players at the Jeopardy! quiz show (www.jeopardy.com). Attaining champion-level performance Jeopardy! requires a computer system to rapidly and accurately answer rich open-domain questions, and to predict its own performance on any given category/question. The system must deliver high degrees of precision and confidence over a very broad range of knowledge and natural language content with a 3-second response time. To do this DeepQA evidences and evaluates many competing hypotheses. A key to success is automatically learning and combining accurate confidences across an array of complex algorithms and over different dimensions of evidence. Accurate confidences are needed to know when to "buzz in" against your competitors and how much to bet. High precision and accurate confidence computations are just as critical for providing real value in business settings where helping users focus on the right content sooner and with greater confidence can make all the difference. The need for speed and high precision demands a massively parallel computing platform capable of generating, evaluating and combing 1000's of hypotheses and their associated evidence. In this talk I will introduce the audience to the Jeopardy! Challenge and how we tackled it using DeepQA. www.ibmwatson.com  

Speaker Biography

Dr. David Ferrucci is the lead researcher and Principal Investigator (PI) for the Watson/Jeopardy! project. He has been a Research Staff Member at IBM's T.J. Watson's Research Center since 1995 where he heads up the Semantic Analysis and Integration department. Dr. Ferrucci focuses on technologies for automatically discovering valuable knowledge in natural language content and using it to enable better decision making. As part of his research he led the team that developed UIMA. UIMA is a software framework and open standard widely used by industry and academia for collaboratively integrating, deploying and scaling advanced text and multi-modal (e.g., speech, video) analytics. As chief software architect for UIMA, Dr. Ferrucci led its design and chaired the UIMA standards committee at OASIS. The UIMA software framework is deployed in IBM products and has been contributed to Apache open-source to facilitate broader adoption and development. In 2007, Dr. Ferrucci took on the Jeopardy! Challenge - tasked to create a computer system that can rival human champions at the game of Jeopardy!. As the PI for the exploratory research project dubbed DeepQA, he focused on advancing automatic, open-domain question answering using massively parallel evidence based hypothesis generation and evaluation. By building on UIMA, on key university collaborations and by taking bold research, engineering and management steps, he led his team to integrate and advance many search, NLP and semantic technologies to deliver results that have out-performed all expectations and have demonstrated world-class performance at a task previously thought insurmountable with the current state-of-the-art. Watson, the computer system built by Ferrucci's team is now competing with top Jeopardy! champions. Under his leadership they have already begun to demonstrate how DeepQA can make dramatic advances for intelligent decision support in areas including medicine, finance, publishing, government and law. Dr. Ferrucci has been the Principal Investigator (PI) on several government-funded research programs on automatic question answering, intelligent systems and saleable text analytics. His team at IBM consists of 28 researchers and software engineers specializing in the areas of Natural Language Processing (NLP), Software Architecture, Information Retrieval, Machine Learning and Knowledge Representation and Reasoning (KR&R). Dr. Ferrucci graduated from Manhattan College with a BS in Biology and from Rensselaer Polytechnic Institute in 1994 with a PhD in Computer Science specializing in knowledge representation and reasoning. He is published in the areas of AI, KR&R, NLP and automatic question-answering.  

May 2, 2011

“Multilingual Subjectivity Analysis”

Rada Mihalcea, University of North Texas

[abstract] [biography]

Abstract

There is growing interest in the automatic extraction of opinions, emotions, and sentiments in text (subjectivity), to provide tools and support for various natural language processing applications. Most of the research to date has focused on English, which is mainly explained by the availability of resources for subjectivity analysis, such as lexicons and manually labeled corpora. In this talk, I will describe methods to automatically generate resources for subjectivity analysis for a new target language by leveraging on the resources and tools available for English, which in many cases took years of work to complete. Specifically, I will try to provide answers to the following questions. First, can we derive a subjectivity lexicon for a new language using an existing English lexicon and a bilingual dictionary? Second, can we derive subjectivity- annotated corpora in a new language using existing subjectivity analysis tools for English and parallel corpora? Finally, third, can we build tools for subjectivity analysis for a new target language by relying on these automatically generated resources?

Speaker Biography

Rada Mihalcea is an Associate Professor in the Department of Computer Science and Engineering. She is currently involved in a number of research projects in computational linguistics, including word sense disambiguation, monolingual and cross-lingual semantic similarity, automatic keyword extraction and text summarization, subjectivity and sentiment analysis, and computational humor. She serves or has served on the editorial boards of the Journals of Computational Linguistics, Language Resources and Evaluations, Natural Language Engineering, and Research in Language in Computation. Her research has been funded by the National Science Foundation, National Endowment for the Humanities, Google, and the State of Texas. She is the recipient of a National Science Foundation CAREER award (2008) and a Presidential Early Career Award for Scientists and Engineers (2009).

July 13, 2011

“Navigating the Interaction Timestream (when your AAC device is a cement block tied to your ankle)”

Jeff Higginbotham, SUNY - Buffalo

[abstract] [biography]

Abstract

Interacting in time is something we all do, most of the time without thought about how it is accomplished. For many individuals who use technology to mediate their interactions, communication success is problematic. These problems entail operating the device within conversational time constraints, as well as coordinating their bodies with their device and their partner as they attempt to produce meaningful utterances. My talk will introduce several types of problems involving time and timing facing augmented speakers and their partners and explore some ways in which NLP and interface design my be helpful in addressing these difficulties.

Speaker Biography

I received my PhD in "comparative studies in human interaction" from the University of Wisconsin - Madison. I'm interested in how augmentative and alternative communication (AAC) devices are used for conversation and other tasks and how these technologies are socially "constructed" by the community in which they are used. I try to use findings from my research to work with designers to build more socially responsive AAC systems.

July 20, 2011

“Distribution Fields for Low Level Vision”   Video Available

Erik Learned-Miller, University of Massachusetts, Amherst

[abstract]

Abstract

Consider the following fundamental problem of low level vision: given a large image I an a patch J from another image, find the "best matching" location of the patch J to image I. We believe the solution to this problem can be significantly improved. A significantly better solution to this problem has the potential to improve a wide variety of low-level vision problems, such as backgrounding, tracking, medical image registration, optical flow, image stitching, and invariant feature definition. We introduce a set of techniques for solving this problem based upon a representation called distribution fields. Distribution fields are an attempt to take the best from a wide variety of low-level vision techniques including geometric blur (Berg), mixture of Gaussians backgrounding (Stauffer), SIFT (Lowe) and HoG (Dalal and Triggs), local color histograms, bilateral filtering, congealing (Learned-Miller) and many other techniques. We show how distribution fields solve this "patch" matching problem, and, in addition to finding the optimum match of patch J to image I with a high success rate, the algorithm produces, as a by-product, a very natural assessment of the quality of that match. We call this algorithm the "sharpening match". Using the sharpening match for tracking yields an extremely simple but state-of-the-art tracker. We also discuss application of these techniques to background subtraction and other low level vision problems.

July 27, 2011

“Large Scale Supervised Embedding for Text and Images”   Video Available

Jason Weston, Google

[abstract] [biography]

Abstract

In this talk I will present two related pieces of research for text retrieval and image annotation that both use supervised embedding algorithms over large datasets. Part 1:The first part of the talk presents a class of models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score. Like latent semantic indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However unlike LSI, our models are trained with a supervised signal directly on the task of interest, which we argue is the reason for our superior results. We provide an empirical study on Wikipedia documents, using the links to define document-document or query-document pairs, where we beat several baselines. We also describe extensions to the nonlinear case and for dealing with huge dictionary sizes. (Joint work with Bing Bai, David Grangier and Ronan Collobert.) Part 2:Image annotation datasets are becoming larger and larger, with tens of millions of images and tens of thousands of possible annotations. We propose a well performing method that scales to such datasets by simultaneously learning to optimize precision at k of the ranked list of annotations for a given image and learning a low-dimensional joint embedding space for both images and annotations. Our method both outperforms several baseline methods and, in comparison to them, is faster and consumes less memory. We also demonstrate how our method learns an interpretable model, where annotations with alternate spellings or even languages are close in the embedding space. Hence, even when our model does not predict the exact annotation given by a human labeler, it often predicts similar annotations, a fact that we try to quantify by measuring the ``sibling'' precision metric, where our method also obtains good results. (Joint work with Samy Bengio and Nicolas Usunier.)

Speaker Biography

Jason Weston is a Research Scientist at Google NY since July 2009. He earned his PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisor: Vladimir Vapnik) in 2000. From 2000 to 2002, he was a Researcher at Biowulf technologies, New York. From 2002 to 2003 he was a Research Scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to June 2009 he was a Research Staff Member at NEC Labs America, Princeton. His interests lie in statistical machine learning and its application to text, audio and images. Jason has published over 80 papers, including best paper awards at ICML and ECML.

August 3, 2011

“Predicting sentence specificity, with applications to news summarization”   Video Available

Ani Nenkova, University of Pennsylvania

[abstract] [biography]

Abstract

A well-written text contains a mix of general statements and sentences that provide specific details. Yet no current work in computational linguistics has addressed the task of predicting the level of specificity of a sentence. In this talk I will present the development and evaluation of an automatic classifier capable of identifying general and specific sentences in news articles. We show that it is feasible to use existing annotations of discourse relations as training data and we validate the resulting classifier on sentences directly judged by multiple annotators. We also provide a task-based evaluation of our classifier on general and specific summaries written by people and demonstrate that the classifier predictions are able to distinguish between the two types of human authored summaries. We also analyze the level of specific and general content in news documents and their human and automatic summaries. We discover that while human abstracts contain a more balanced mix of general and specific content, automatic summaries are overwhelmingly specific. We find that too much specificity adversely affects the quality of the summary. The study of sentence specificity extends our prior work on text quality which I will briefly overview. This is joint work with my student Annie Louis.

Speaker Biography

Ani Nenkova is an assistant professor of computer and information science at the University of Pennsylvania. Her main areas of research are automatic summarization, discourse and text quality. She obtained her PhD degree in computer science from Columbia University in 2006. She also spent a year and a half as a postdoctoral fellow at Stanford University before joining Penn in Fall 2007.

August 10, 2011

“Hierarchical modeling and prior information: an example from toxicology”   Video Available

Andrew Gelman, Columbia University

[abstract]

Abstract

We describe a general approach using Bayesian analysis for the estimation of parameters in physiological pharmacokinetic models. The chief statistical difficulty in estimation with these models is that any physiological model that is even approximately realistic will have a large number of parameters, often comparable to the number of observations in a typical pharmacokinetic experiment (e.g., 28 measurements and 15 parameters for each subject). In addition, the parameters are generally poorly identified, as in the well-known ill-conditioned problem of estimating a mixture of declining exponentials Our modeling includes (a)hierarchical population modeling, which allows partial pooling of information among different experimental subjects; (b) a pharmacokinetic model including compartments for well-perfused tissues, poorly perfused tissues, fat, and the liver; and (c) informative prior distributions for population parameters, which is possible because the parameters represent real physiological variables. We discuss how to estimate the models using Bayesian posterior simulation, a method that automatically includes the uncertainty inherent in estimating such a large number of parameters. We also discuss how to check model fit and sensitivity to the prior distribution using posterior predictive simulation.

August 12, 2011

“Low-dimensional speech representation based on Factor Analysis and its applications”

Najim Dehak, MIT

[abstract] [biography]

Abstract

We introduce a novel approach to data-driven feature extraction stemming from the field of speaker recognition. In the last five years, statistical methods rooted in factor analysis have greatly enhanced the traditional representation of a speaker using Gaussian Mixture Models (GMMs). In this talk, we build some intuition by outlining the historical development of these methods and then survey the variety of applications made possible by this approach. To begin, we discuss the development of Joint Factor Analysis (JFA), which was motivated by a desire to both model speaker variabilities and compensate for channel/session variabilities at the same time. In doing so, we introduce the notion of a GMM supervector, a high-dimensional vector created by concatenating the mean vectors of each GMM component. JFA assumes that this supervector can be decomposed into a sum of two parts: one containing relevant speaker-specific information and another containing channel-dependent nuisance factors that need to be compensated. We will describe the methods used to estimate these hidden parameters. The success of JFA led to a proposed simplification using just factor analysis for the extraction of speaker-relevant features. The key assumption here is that most of the variabilities between GMM supervectors can be explained by a (much) lower-dimensional space of underlying factors. In this approach, a given utterance of any length is mapped into a single, low-dimensional "total variability" space. We call the resulting vector an i-vector, short for "identity vector" in the speaker recognition sense or "intermediate vector" for its intermediate size between that of a supervector and that of an acoustic feature vector. Unlike in JFA, the total variability approach makes no distinction between speaker and inter-session variabilities in the high-dimensional supervector space; instead, channel compensation occurs in the lower-dimensional i-vector space. The presentation will provide an outline of the process that can be used to build a robust speaker verification system. Though originally proposed for speaker modeling, the i-vector representation can be seen more generally as an elegant framework for data-driven feature extraction. After covering the necessary background theory, we will discuss our recent work in applying this approach to a variety of other audio classification problems, including speaker diarization and language identification.  

Speaker Biography

Najim Dehak received his Engineering degree in Artificial Intelligence in 2003 from Universite des Sciences et de la Technologie d'Oran, Algeria, and his MS degree in Pattern Recognition and Artificial Intelligence Applications in 2004 from the Universite de Pierre et Marie Curie, Paris, France. He obtained his Ph.D. degree from Ecole de Technologie Superieure (ETS), Montreal in 2009. During his Ph.D. studies he was also with Centre de recherche informatique de Montreal (CRIM), Canada. In the summer of 2008, he participated in the Johns Hopkins University, Center for Language and Speech Processing, Summer Workshop. During that time, he proposed a new system for speaker verification that uses factor analysis to extract speaker-specific features, thus paving the way for the development of the i-vector framework. Dr. Dehak is currently a research scientist in the Spoken Language Systems (SLS) Group at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research interests are in machine learning approaches applied to speech processing and speaker modeling. The current focus of his research involves extending the concept of an i-vector representation into other audio classification problems, such as speaker diarization, language- and emotion-recognition.  

September 2, 2011

“Applications of weighted finite state transducers in a speech recognition toolkit”   Video Available

Daniel Povey, Microsoft Research

[abstract] [biography]

Abstract

The open-source speech recognition toolkit "Kaldi" uses weighted finite state transducer (WFSTs) for training and decoding, and uses the OpenFst toolkit as a C++ library. I will give an informal overview of WFSTs and of the standard AT&T recipe for WFST based decoding, and will mention some problems (in my opinion) with the basic recipe and how we addressed them while developing Kaldi. I will also describe how to use WFSTs to acheive "exact" lattice generation, in a sense will be explained. This is an interesting application of WFSTs because, unlike most WFST mechanisms, it does not have any obvious non-WFST analog.

Speaker Biography

Daniel Povey received his Bachelor's (Natural Sciences, 1997), Master's (Computer Speech and Language Processing, 1998) and PhD (Engineering, 2003) from Cambridge University. He is currently a researcher at Microsoft Research, Redmond, Washington, USA. From 2003 to 2008 he worked as a researcher in IBM Research in Yorktown Heights, NY. He is best known for his work on discriminative training for HMM-GMM based speech recognition (i.e. MMI, MPE, and their feature-space variants).

September 6, 2011

“Learning to Describe Images”

Julia Hockenmaier, University of Illinois

[abstract] [biography]

Abstract

How can we create an algorithm that learns to associate images with sentences in natural language that describe the situations depicted in them? This talk will describe ongoing research towards this goal, with a focus on the natural language understanding aspects. Although we believe that this task may benefit from improved object recognition and deeper linguistic analysis, we show that models that rely on simple perceptual cues of color, texture and local feature descriptors on the image side, and on sequence-based features on the text side, can do surprisingly well. We also demonstrate how to leverage the availability of multiple captions for the same image.  

Speaker Biography

Julia Hockenmaier is assistant professor of computer science at the University of Illinois at Urbana-Champaign. She came to Illinois after a postdoc at the University of Pennsylvania and a PhD at the University of Edinburgh. She holds an NSF CAREER award.

September 16, 2011

“Short URLs, Big Data: Machine Learning at Bitly”

Hilary Mason, bit.ly

[abstract] [biography]

Abstract

Bitly is a URL shortening service, gathering hundreds of millions of data points about the links people share every day. I'll discuss the data analysis techniques that we use, giving examples of machine learning problems that we are solving at scale, and talk about the differences between industry, startup, and academic research.

Speaker Biography

Hilary Mason is the Chief Scientist at bit.ly, where she finds sense in vast data sets. Her work involves both pure research and development of product-focused features. She's also a co-founder of HackNY (hackny.org), a non-profit organization that connects talented student hackers from around the world with startups in NYC. Hilary recently started the data science blog Dataists (dataists.com) and is a member of hacker collective NYC Resistor. She has discovered two new species, loves to bake cookies, and asks way too many questions.  

September 20, 2011

“When Topic Models Go Bad: Diagnosing and Improving Models for Exploring Large Corpora”   Video Available

Jordan Boyd-Graber, University of Maryland

[abstract] [biography]

Abstract

Imagine you need to get the gist of what's going on in a large text dataset such as all tweets that mention Obama, all e-mails sent within a company, or all newspaper articles published by the New York Times in the 1990s. Topic models, which automatically discover the themes which permeate a corpus, are a popular tool for discovering what's being discussed. However, topic models aren't perfect; errors hamper adoption of the model, performance in downstream computational tasks, and human understanding of the data. However, humans can easily diagnose and fix these errors. We describe crowdsourcing experiments to detect problematic topics and to determine which models produce comprehensible topics. Next, we present a statistically sound model to incorporate hints and suggestions from humans to iteratively refine topic models to better model large datasets.If time permits, we will also examine how topic models can be used to understand topic control in debates and discussions.

Speaker Biography

Jordan Boyd-Graber in an assistant professor in the College of Information Studies and the Institute for Advanced Computer Studies at the University of Maryland, focusing on the interaction of users and machine learning: how algorithms can better learn from human behaviors and how users can better communicate their needs to machine learning algorithms. Previously, he worked as a postdoc with Philip Resnik at the University of Maryland. Until 2009, he was a graduate student at Princeton University working with David Blei on linguistic extensions of topic models. His current work is supported by NSF, IARPA, and ARL.

September 27, 2011

“Multilingual Guidance for Unsupervised Linguistic Structure Prediction”   Video Available

Dipanjan Das, Carnegie Mellon University

[abstract] [biography]

Abstract

Learning linguistic analyzers from unannotated data remains a major challenge; can multilingual text help? In this talk, I will describe learning methods that use unannotated data in a target language along with annotated data in more resource-rich "helper" languages. I will focus on two lines of work. First, I will describe a graph-based semi-supervised learning approach that uses parallel data to learn part-of-speech tag sequences through type-level lexical transfer from a helper language. Second, I will examine a more ambitious goal of learning part-of-speech sequences and dependency trees from raw text, leveraging parameter-level transfer from helper languages, but without any parallel data. Both approaches result in significant improvements over strong state-of-the-art monolingual unsupervised baselines.

Speaker Biography

Dipanjan Das is a Ph.D. student at the Language Technologies Institute, School of Computer Science at Carnegie Mellon University. He works on statistical natural language processing under the mentorship of Noah Smith. He ?nished his M.S. at the same institute in 2008, conducting research on language generation with Alexander Rudnicky. Das completed his undergraduate degree in 2005 from the Indian Institute of Technology, Kharagpur, where he received the best undergraduate thesis award in Computer Science and Engineering and the Dr. B.C. Roy Memorial Gold Medal for best all-round performance in academics and co-curricular activities. He worked at Google Research, New York as an intern in 2010 and received the best paper award at the ACL 2011 conference. He has published and served as program committee member and reviewer at conferences such as ACL, NIPS, NAACL, COLING, and EMNLP during 2008–2011.

September 30, 2011

“Predicting wh-dependencies: Parsing, interpretation, and learning perspectives”   Video Available

Akira Omaki, Johns Hopkins University

[abstract] [biography]

Abstract

This talk focuses on syntactic prediction and examines its implication for models of sentence processing and language learning. Predicting upcoming syntactic structures reduces processing demand and allows successful comprehension in the presence of noise, but on the other hand, such predictions are risky in that they could potentially lead readers/listeners to wrong analyses (i.e. garden-paths) and cause processing difficulties that we often fail to overcome. The goal of the talk is three-fold. First, I will present a series of eye-tracking data with adults to establish that predictive syntactic analyses and interpretations are indeed possible in processing wh-dependencies. Second, I will examine the risks of wh-dependency prediction for adults and children. The first risk factor is revision failure: the failure to revise the initial analysis can lead adults and children to misunderstand sentences with wh-dependencies. Here, I will present comprehension data on adults and children's revision failures in French, English and Japanese and demonstrate that the degree of revision difficulties can be attenuated by semantic properties of the verbs. The second risk factor is consequence on learning: If learners predictively analyze wh-dependencies and always disambiguate the dependencies with a bias, would the input distribution for learners be skewed in such a way that the learners fail to observe certain interpretive possibilities that are allowed in the target language? I will discuss the distribution of wh-dependencies in child-directed speech, and examine how the input distribution will be skewed when we incorporate the experimental findings on children's parsers. I will argue that a simple integration of syntactic prediction could potentially create a learnability problem, but that this problem could be overcome once we allow children to integrate verb information to reanalyze the parse.

Speaker Biography

Akira Omaki is an assistant professor of Cognitive Science at the Johns Hopkins University, and his research focuses on the dynamics of sentence processing and first/second language development. He received his PhD in Linguistics at the University of Maryland, and joined the Cognitive Science faculty after a post-doc at the University of Geneva.

October 4, 2011

“Decoding time set by neuronal oscillations locked to the input rhythm: a neglected cortical dimension in models of speech perception”   Video Available

Oded Ghitza, Hearing Research Center & Center for BioDynamics, Boston University

[abstract] [biography]

Abstract

Speech is an inherently rhythmic phenomenon in which the acoustic signal is transmitted in syllabic "packets" and temporally structured so that most of the energy fluctuations occur in the range between 3 and 10 Hz. The premise of our approach is that this rhythmic property reflects some fundamental property, one internal to the brain. We suggest that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. Nested neuronal oscillations in the theta, beta and gamma frequency bands are argued to be crucial for speech intelligibility. Intelligibility is high so long as these neuronal oscillations remain phase-locked to the auditory input rhythm. A model (Tempo) is presented which seems capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of syllabic rate (Ghitza & Greenberg, 2009). The data show that intelligibility of speech that is time-compressed by a factor of 3 (i.e., a high syllabic rate) is poor (above 50% word error rate), but is substantially restored when silence gaps are inserted in between successive 40- ms long compressed-signal intervals -- a counterintuitive finding, difficult to explain using classical models of speech perception, but emerging naturally from the Tempo architecture. In my talk I will present the architecture of Tempo and discuss the implications of the new dimensions of the model seem necessary to account for the Ghitza & Greenberg data.Reading material: Ghitza, O. and Greenberg, S. (2009). "On the possible role of brain rhythms in speech perception: Intelligibility of time compressed speech with periodic and aperiodic insertions of silence." Phonetica 66:113--126. doi:10.1159/000208934 Ghitza, O. (2011). "Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm." Front. Psychology 2:130. doi: 10.3389/fpsyg.2011.00130

Speaker Biography

ODED GHITZA received the B.Sc., M.Sc. and Ph.D. degrees in Electrical Engineering from Tel-Aviv University, Israel, in 1975, 1977 and 1983, respectively. From 1968 to 1984 he was with the Signal Corps Research Laboratory of the Israeli Defense Forces. During 1984-1985 he was a Bantrell post-doctoral fellow at MIT, Cambridge, Massachusetts, and a consultant with the Speech Systems Technology Group at Lincoln Laboratory, Lexington, Massachusetts. From 1985 to early 2003 he was with the Acoustics and Speech Research Department, Bell Laboratories, Murray Hill, New Jersey, where his research was aimed at developing models of hearing and at creating perception based signal analysis methods for speech recognition, coding and evaluation. From early 2003 to early 2011 he was with Sensimetrics Corp., Malden, Massachusetts, where he continued to model basic knowledge of auditory physiology and of perception for the purpose of advancing speech, audio and hearing-aid technology. From 2005 to 2008 he was with the Sensory Communication Group at MIT. Since mid 2006 he is with the Hearing Research Center and with the Center for Biodynamics at Boston University, where he studies the role of brain rhythms in speech perception.

October 14, 2011

“Probabilistic hashing for similarity searching and machine learning on large datasets in high dimensions”

Ping Li, Cornell University

[abstract]

Abstract

Many applications such as information retrieval make use of efficient (approximate) estimates of set similarity. A number of such estimates have been discussed in the literature: minwise hashing, random projections and compressed sensing. This talk presents an improvement: b-bit minwise hashing. An evaluation on large real-life datasets will show large gains in both space and time. In addition, we will characterize the improvement theoretically, and show that the theory matches the practice.More recently, we realized that (b-bit) minwise hashing can not only be used for similarity matching but also for machine learning. Applying logistic regression and SVMs to large datasets faces numerous practical challenges. As datasets become larger and larger, they take too long to load and may not fit in memory. Training and testing time can become an issue. Error analysis and exploratory data analysis are rarely performed on large datasets because it is too painful to run lots of what-if scenarios and explore lots of high-order interactions (pairwise, 3-way, etc.). The proposed method has been applied to two large datasets: a "smaller" dataset (24GB in 16M dimensions) and a "larger" dataset (200GB in 1B dimensions). Using a single desktop computer, the proposed method takes 3 seconds to train an SVM for the smaller dataset and 30 seconds for the larger dataset.

October 25, 2011

“Sparse Models of Lexical Variation”   Video Available

Jacob Eisenstein, Carnegie Mellon University

[abstract] [biography]

Abstract

Text analysis involves building predictive models and discovering latent structures in noisy and high-dimensional data. Document classes, latent topics, and author communities are often distinguished by a small number of trigger words or phrases -- needles in a haystack of irrelevant features. In this talk, I describe generative and discriminative techniques for learning sparse models of lexical differences. First, I show how multi-task regression with structured sparsity can identify a small subset of words associated with a range of demographic attributes in social media, yielding new insights about the complex multivariate relationship between demographics and lexical choice. Second, I present SAGE, a novel approach to sparsity in generative models of text, in which we induce sparse deviations from background log probabilities. As a generative model, SAGE can be applied across a range of supervised and unsupervised applications, including classification, topic modeling, and latent variable models.

Speaker Biography

Jacob Eisenstein is a postdoctoral fellow in the Machine Learning Department at Carnegie Mellon University. His research focuses on machine learning for social media analysis, discourse, and non-verbal communication. Jacob completed his Ph.D. at MIT in 2008, winning the George M. Sprowls dissertation award. In January 2012, Jacob will join Georgia Tech as an Assistant Professor in the School of Interactive Computing.

November 1, 2011

“Detecting Deceptive On-Line Reviews”

Claire Cardie, Cornell University

[abstract] [biography]

Abstract

Consumers increasingly rate, review, and research products online. Consequently, websites containing consumer reviews are becoming targets of opinion spam. While recent work has focused primarily on manually identifiable instances of opinion spam, this talk describes the first study of "deceptive opinion spam" --- fictitious opinions that have been deliberately written to sound authentic. Integrating work from psychology and computational linguistics, we develop and compare three approaches to detecting deceptive opinion spam, and ultimately develop a classifier that is nearly 90% accurate on our gold-standard opinion spam dataset. Feature analysis of our learned models reveals a relationship between deceptive opinions and imaginative writing. Finally, the talk will describe the results of a preliminary study that uses the opinion spam classifier to estimate the prevalence of fake reviews on two popular hotel review sites.

Speaker Biography

Claire Cardie is a Professor in the Computer Science and Information Science departments at Cornell University. She got her B.S. in Computer Science from Yale University and an M.S. and PhD, also in Computer Science, at the University of Massachusetts at Amherst. Her research in the area of Natural Language Processing has focused on the application and development of machine learning methods for information extraction, coreference resolution, digital government applications, the analysis of opinions and subjective text, and, most recently, deception detection. Cardie is a recipient of a National Science Foundation CAREER award, and has served elected terms as an executive committee member of the Association for Computational Linguistics (ACL), an executive council member of the Association for the Advancement of Artificial Intelligence (AAAI), and twice as secretary of the North American chapter of the ACL (NAACL). Cardie is also co-founder and chief scientist of Appinions.com, a start-up focused on extracting and aggregating opinions from on-line text and social media.

November 11, 2011

“Learning Semantic Parsers for More Languages and with Less Supervision”   Video Available

Luke Zettlemoyer, University of Washington

[abstract] [biography]

Abstract

Recent work has demonstrated effective learning algorithms for a variety of semantic parsing problems, where the goal is to automatically recover the underlying meaning of input sentences. Although these algorithms can work well, there is still a large cost in annotating data and gathering other language-specific resources for each new application. This talk focuses on efforts to address these challenges by developing scalable, probabilistic CCG grammar induction algorithms. I will present recent work on methods that incorporate new notions of lexical generalization, thereby enabling effective learning for a variety of different natural languages and formal meaning representations. I will also describe a new approach for learning semantic parsers from conversational data, which does not require any manual annotation of sentence meaning. Finally, I will sketch future directions, including our recurring focus on building scalable learning techniques while attempting to minimize the application-specific engineering effort. Joint work with Yoav Artzi, Tom Kwiatkowski, Sharon Goldwater, and Mark Steedman

Speaker Biography

Luke Zettlemoyer is an Assistant Professor at the University of Washington. His research interests are in the intersections of natural language processing, machine learning and decision making under uncertainty. He spends much of his time developing learning algorithms that attempt to recover and make use of detailed representations of the meaning of natural language text. He was a postdoctoral research fellow at the University of Edinburgh and received his Ph.D. from MIT.  

November 15, 2011

“Object Detection Grammars”   Video Available

David McAllester, Toyota Technological Institute at Chicago

[abstract] [biography]

Abstract

As statistical methods came to dominate computer vision, speech recognition and machine translation there was a tendency toward shallow models. The late Fred Jelinek is famously quoted as saying that every time he fired a linguist the performance of his speech recognition system improved. A major challenge of modern statistical methods is to demonstrate that deep models can be made to perform better than shallow models. This talk will describe an object detection system which tied for first place in the 2008 and 2009 PASCAL VOC object detection challenge and won a PASCAL "lifetime achievement" award in 2010. The system exploits a grammar model for representing object appearance. This model seems "deeper" than those used in the previous generation of statistically trained object detectors. This object detection system and the associated grammar formalism will be described in detail and future directions discussed.

Speaker Biography

Professor McAllester received his B.S., M.S., and Ph.D. degrees from the Massachusetts Institute of Technology in 1978, 1979, and 1987 respectively. He served on the faculty of Cornell University for the academic year of 1987-1988 and served on the faculty of MIT from 1988 to 1995. He was a member of technical staff at AT&T Labs-Research from 1995 to 2002. Since 2002 he has been Chief Academic Officer at the Toyota Technological Institute at Chicago. He has been a fellow of the American Association of Artificial Intelligence (AAAI) since 1997. A 1988 paper on computer game algorithms influenced the design of the algorithms used in the Deep Blue system that defeated Gary Kasparov. A 1991 paper on AI planning proved to be one of the most influential papers of the decade in that area. A 1998 paper on machine learning theory introduced PAC-Bayesian theorems which combine Bayesian and nonBayesian methods. A 2001 paper with Andrew Appel introduced the influential step-index model of recursive types. He is currently part of a team that scored in the top two places in the PASCAL object detection challenge (computer vision) in 2007, 2008 and 2009.

November 22, 2011

“Robust Representation of Attended Speech in Human Brain with Implications for ASR”

Nima Mesgarani, University of California, San Francisco

[abstract] [biography]

Abstract

Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and indeed, it is not clear how attended speech is internally represented. Here, using multi-electrode recordings from the cortex of epileptic patients engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in the temporal lobe faithfully encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal salient spectral and temporal features of the attended speaker, as if listening to that speaker alone. Therefore, a simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both local single-electrode and population-level cortical responses. These findings demonstrate that the temporal lobe cortical representation of speech does not merely reflect the external acoustic environment, but instead correlates to the perceptual aspects relevant for the listener's intended goal. An engineering approach for ASR that is inspired by a model of this process is shown to improve recognition accuracy in new noisy conditions.

Speaker Biography

Nima Mesgarani is a postdoctoral scholar at the department of neurological surgeries of University of California San Francisco. He received his Ph.D. in electrical engineering from University of Maryland College Park. He was a postdoctoral fellow at Center for Speech and Language processing at Johns Hopkins University prior to joining UCSF. His research interests include studying the representation of speech in brain and its implications for speech processing technologies.

December 2, 2011

“A computational approach to early language bootstrapping”   Video Available

Emmanuel Dupoux, Ecole Normale Supérieure

[abstract]

Abstract

Human infants learn spontaneously and effortlessly the language(s) spoken in their environments, despite the extraordinary complexity of the task. In the past 30 years, tremendous progress has been made regarding the empirical investigation of the linguistic achievements of infants during their first two years of life. In that short period of their life, infants learn in an essentially unsupervised fashion the basic building blocks of the phonetics, phonology, lexical and syntactic organization of their native language (see Jusczyk, 1987). Yet, little is known about the mechanisms responsible for such acquisitions. Do infants rely on general statistical inference principles? Do they rely on specialized algorithms devoted to language? Here, I will present an overview of the early phases of language acquisition and focus on one area where a modeling approach is currently being conducted, using tools of signal processing and automatic speech recognition: the unsupervized acquisition of phonetic categories. It is known that during the first year of life, before they are able to talk, infants construct a detailed representation of the phonemes of their native language and loose the ability to distinguish nonnative phonemic contrasts (Werker & Tees, 1984). It will be shown that the only mechanism that has been proposed so far, that is, unsupervised statistical clustering (Maye, Werker and Gerken, 2002), may not converge on the inventory of phonemes, but rather on contextual allophonic units that are smaller than the phoneme (Varadarajan, 2008). Alternative algorithms will be presented using three sources of information: the statistical distribution of their contexts, the phonetic plausibility of the grouping, and the existence of lexical minimal pairs (Peperkamp et al., 2006; Martin et al, submitted). It is shown that each of the three sources of information can be acquired without presupposing the others, but that they need to be combined to arrive at good performance. Modeling results and experiments in human infants will be presented. The more general proposal is that early language bootrapping may not rely on learning principles necessarily specific to language. What is presumably unique to language though, is the way in which these principles are combined in a particular ways to optimize the emergence of linguistic categories after only a few months of unsupervized exposure to speech signals. Jusczyk, P. (1997). The discovery of spoken language. Cambridge, MA: MIT Press. Martin, A., Peperkamp, S., & Dupoux, E. (submitted). Learning phonemes with a pseudo-lexicon. Maye, J., Werker, J., & Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82, B101-B111. Peperkamp, S., Le Calvez, R., Nadal, J.P. and Dupoux, E. (2006). The acquisition of allophonic rules: statistical learning with linguistic constraints. Cognition, 101, B31-B41 Varadarajan, B., Khudanpur, S. & Dupoux, E. (2008). Unsupervised Learning of Acoustic Subword Units, in Proceedings of ACL-08: HLT, 165-168. Werker, J.F., & Tees, R.C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7, 49-63.  

Back to Top