Archived Seminars by Year
February 5, 2008
Bhuvana Ramabhadran, IBM
AbstractEarly word-spotting systems processed the audio signal to produce phonetic transcripts without the use of an automatic speech recognition (ASR) system. In the past decade, most of the research efforts on spoken data retrieval have focused on extending classical IR techniques to word transcripts. Some of these have been done in the framework of the NIST TREC Spoken Document Retrieval tracks.The use of word and phonetic transcripts was explored more recently in the context of the Spoken Term Detection (STD) 2006 evaluation conducted by NIST. In this talk, I will begin with IBMs submission to the STD evaluation and cover recent work at IBM to enhance the performance of the end-to-end audio search system. The first technique proposes the use of a similarity measure based on a phonetic confusion matrix that accounts for higher-order phonetic confusions (phone bi-grams and tri-grams) and the second is an application of vector space modeling, particularly Latent Semantic Analysis (LSA), to shortlist the most relevant audio segments, resulting in the same level of performance when using only 3% of the overall collection instead of the entire collection for search.
Speaker BiographyDr. Bhuvana Ramabhadran is a Research Staff Member in the Multilingual Analytics and User Technologies at the IBM T.J. Watson Research Center. Since joining IBM in 1995, she has made significant contributions to the ViaVoice line of products and served as the Principal Investigator for the NSF-funded project, Multilingual Access to Large Spoken Archives: MALACH and EU-funded project, TC-STAR: Technology and Corpora for Speech-to-Speech Translation. She currently manages a group that focuses on large vocabulary speech transcription, audio information retrieval and text-to-speech synthesis. Her research interests include speech recognition algorithms, statistical signal processing, pattern recognition and biomedical engineering.
February 12, 2008
Antti-Veikko Rosti, BBN
AbstractThe interest in system combination for machine translation has recently increased due to programs involving multiple sites. In programs, such as the DARPA GALE, the sites develop MT systems independently for the same task. As these systems have different strengths and only a single output for each task is evaluated, several methods to combine the outputs from all systems to leverage their strengths have been explored. The system combination efforts within the AGILE team from the beginning of the GALE program until the recent re-test are presented in this talk. The talk will cover topics from two recent papers presented at the 2007 NAACL-HLT and ACL conferences as well as the latest improvements developed for the GALE Phase 2 re-test. Related papers: http://acl.ldc.upenn.edu/N/N07/N07-1029.pdf http://acl.ldc.upenn.edu/P/P07/P07-1040.pdf
Speaker BiographyAntti-Veikko Rosti received his MSc in information technology from Tampere University of Technology, Finland, and PhD in information engineering from Cambridge University, UK. He joined IBM Research as a postdoctoral researcher in Yorktown Heights, NY in 2004. Since 2005 he has been a scientist at BBN Technologies in Cambridge, MA. His research interests are in statistical signal processing and machine learning with a particular emphasis on their application to audio, speech, and language processing.
February 19, 2008
Ani Nenkova, University of Pennsylvania
AbstractThe ability to automatically predict appropriate prominence patterns is considered a key factor for improving the naturalness of text-to-speech synthesis systems. I will present results from a large human preference experiment showing that indeed even simple models of pitch accent and contrast/focus in a TTS system lead to measurable and significant improvements in concatenative synthesis. I will also present a study of prominence in conversational speech based on the Switchboard corpus. The corpus has been richly annotated for binary pitch accent information, as well as for semantically motivated distinctions such as contrast (narrow focus) and givenness (given/new distinctions), allowing for in-depth analysis of the factors involved in prominence assignment. This is joint work with Dan Jurafsky and other colleagues and parts of it have been presented at NAACL-HLT'07, Interspeech'07 and ASRU'07.
Speaker BiographyAni Nenkova is an assistant professor of computer and information science at University of Pennsylvania. Prior to this appointment she worked as a postdoctoral fellow with Dan Jurafsky at Stanford University. She holds a Ph.D degree from Columbia University where she worked on different aspects of multi-document summarization of news.
February 26, 2008
Smaranda Muresan, UMD
AbstractTraditional natural language processing systems have focused on modeling the deep, human-like level of text understanding, by integrating syntax and semantics. However, they overlooked a key requirement for scalability: learning. Modern natural language systems on the other hand, have embraced learning methods to ensure scalability, but they remain at a shallow level of text understanding by their inability to successfully model semantics. In this talk I will present a computationally efficient model for deep language understanding that brings together syntax, semantics and learning. I will present a new grammar formalism, Lexicalized Well-Founded Grammar, which integrates syntax and semantics, and is learnable from a small set of representative annotated examples, defining the importance to the model linguistically, and not simply by frequency, as in most previous work. The grammar rules have compositional and ontology constraints that provide access to meaning during parsing. The semantic representation is an ontology query language which allows a deep-level text-to-knowledge acquisition. I have proven that under appropriate assumptions the search space for grammar learning is a complete grammar lattice, which guarantees the uniqueness of the solution. I will show the linguistic relevance of a practical LWFG learning framework and its utility for populating terminological knowledge bases from text in the medical domain.
Speaker BiographySmaranda Muresan received her PhD degree in Computer Science from Columbia University. She is currently a Postdoctoral Research Associate at the Institute for Advanced Computer Studies at University of Maryland. Her research interests include language learning and understanding, machine translation and relational learning. Her work unifies two separate but central themes in human language technologies: computational formalisms to express language phenomena and induction of knowledge from data.
March 4, 2008
Jim Glass, MIT
AbstractThe development of an automatic speech recognizer is typically a highly supervised process involving the specification of phonetic inventories, lexicons, acoustic and language models, along with annotated training corpora. Although some model parameters may be modified via adaptation, the overall structure of the speech recognizer remains relatively static thereafter. While this approach has been effective for problems when there is adequate human expertise and labeled corpora, it is challenged by less-supervised or unsupervised scenarios. It also stands in stark contrast to human processing of speech and language where learning is an intrinsic capability. From a machine learning perspective, a complementary alternative is to discover unit inventories in an unsupervised manner by exploiting the structure of repeating acoustic patterns within the speech signal. In this work we use pattern discovery methods to automatically acquire lexical entities, as well as speaker and topic segmentations directly from an untranscribed audio stream. Our approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which allows us to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, we demonstrate how to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multi-word phrases. On a corpus of lecture material, we demonstrate that clusters found using this technique exhibit high purity and that many of the corresponding lexical identities are relevant to the underlying audio stream. We have applied the acoustic pattern matching and clustering methods to several important problems in speech and language processing. In addition to showing how this methodology applies across different languages, we demonstrate two methods to automatically determine the identify of speech clusters. Finally, we also show how it can be used to provide an unsupervised segmentation of speakers and topics. Joint work with Alex Park, Igor Malioutov, and Regina Barzilay.
Speaker BiographyJames R. Glass obtained his S.M. and Ph.D. degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. He is currently a Principal Research Scientist at the MIT Computer Science and Artificial Intelligence Laboratory where he heads the Spoken Language Systems Group. He is also a Lecturer in the Harvard-MIT Division of Health Sciences and Technology. His primary research interests are in the area of speech communication and human-computer interaction, centered on automatic speech recognition and spoken language understanding.
March 18, 2008
March 25, 2008
Peter Anick, Yahoo
AbstractWeb search engines sift through billions of documents to identify those most likely to be relevant to a short natural language query. This functionality can be exploited to relate queries not just to documents but also to other concepts and queries. In this talk, we describe several applications of this principle, including the generation of query refinement suggestions for interactive search assistance and the discovery of alternative descriptors for an advertiserÃ¢â‚¬â„¢s product space.
Speaker BiographyPeter Anick is a member of the Applied Sciences group at Yahoo! where he currently works on developing infrastructure and tools for supporting online query assistance, such as YahooÃ¢â‚¬â„¢s recently released Ã¢â‚¬Å“Search AssistÃ¢â‚¬Â product. He received his PhD in computer science from Brandeis University in 1999. Prior to that, he worked for many years in Digital Equipment CorporationÃ¢â‚¬â„¢s Artificial Intelligence Technology groups on applications of computational linguistics, including online text search for customer support and natural language interfaces for expert and database systems, and subsequently at AltaVista and Overture. His research interests include intelligent information retrieval, user interfaces for exploratory search, text data mining and lexical semantics of nouns and noun compounds. He is a member of ACM SIGIR, former editor of SIGIR Forum and current workshops program chair for SIGIRÃ¢â‚¬â„¢08.
April 1, 2008
Tong Zhang, Rutgers
AbstractWe present two algorithms for sparse learning, where our goal is to estimate a target function that is a sparse linear combination of a set of basis functions. The first method is an online learning algorithm that focuses on scalability and can solve problems with large numbers of features and training data. We propose a general method called truncated gradient that can induce sparsity in the weights of online learning algorithms with convex loss functions. The approach is theoretically motivated, and can be regarded as an online counterpart of the popular L1-regularization method in the batch setting. We prove that small rates of sparsification result in only small additional regret with respect to typical online learning guarantees. Empirical experiments show that the approach works well. The second method is a batch learning algorithm focuses on effective feature selection. Since this problem is NP-hard in the general setting, approximation solutions are necessary. Two methods that are widely used to solve this problem are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination called FoBa that simultaneously incorporates forward and backward steps in a specific way, and show that the resulting procedure can effectively solve this NP-hard problem under quite reasonable conditions. The first part is joint with John Langford, Lihong Li and Alexander Strehl.
Speaker BiographyTong Zhang received a B.A. in mathematics and computer science from Cornell University in 1994 and a Ph.D. in Computer Science from Stanford University in 1998. After graduation, he worked at IBM T.J. Watson Research Center in Yorktown Heights, New York, and Yahoo Research in New York city. He is currently an associate professor of statistics at Rutgers University. His research interests include machine learning, algorithms for statistical computation, their mathematical analysis and applications.
April 8, 2008
Vito Pirrelli, CNR
AbstractAXOMs are hierarchically-arranged Self-organizing Maps (SOMs) in an asynchronous feed-forward relation. In AXOMs, an incoming input word is sampled on a short time scale, and recoded through the topological activation state of a first-level SOM, called the phonotactic layer, placed at the bottom of the hierarchy. The activation state is eventually projected upwards to the second-level map in the hierarchy (or lexical layer) on a longer time scale. In the talk, we shall provide the formal underpinnings of AXOMs, together with a concrete illustration of their behaviour through two language learning sessions, simulating the acquisition of Italian and English verb forms respectively. The architecture is capable of mimicking two levels of long-term memory chunking: low-level segmentation of phonotactic patterns and higher-level morphemic chunking, together with their feeding relation. It turns out that the topology of second-level maps mirrors a meta-paradigmatic organization of the inflection lexicon, clustering verb paradigms sharing the same conjugation class, based on the principle of formal contrast. Examples of Vito's recent work are available at available here. These papers may be of particular interest: Calderone, B., I. Herreros, V. Pirrelli, 2007, Learning Inflection: the importance of starting big, Lingue e Linguaggio, vol. 2 Pirrelli, Vito, and Ivan Herreros (2007) ?Learning Morphology by Itself?, in Proceedings of the Fifth Mediterranean Morphology Meeting
Speaker BiographyVito Pirrelli received a laurea degree in the Humanities from the Linguistics Department of Pisa University (Italy) and a PhD in Computational Linguistics from Salford University (UK) with a dissertation in Ã¢â‚¬Å“Morphology, Analogy and Machine TranslationÃ¢â‚¬Â. Currently he is Research Director at the CNR Institute for Computational Linguistics in Pisa and teaches Ã¢â‚¬Å“Computer for HumanitiesÃ¢â‚¬Â at the Department of Linguistics of Pavia University. Author of two books and several journal and conference articles in Computational and Theoretical Linguistics, his main research interests include: Machine language learning Computer models of the mental lexicon Psycho-computational models of morphology learning and processing Hybrid models of language processing Information extraction Theoretical Morphology
April 15, 2008
Daniel Marcu, Language Weaver / ISI
AbstractAs the Natural Language Processing (NLP) and Machine Learning fields mature, the gap between the mathematical equations we write when we model a problem statistically and the manner in which we implement these equations in NLP applications is widening. In this talk, I review first some of the challenges that we face when searching for best solutions in large-scale statistical applications, such as machine translation, and the effect that the ignoring of these challenges is having on end-to-end results. I also present recent developments that have the potential to impact positively a wide range of applications where parameter estimation and search are critical.
Speaker BiographyDaniel Marcu is the Chief Technology Officer of Language Weaver Inc. and an Associate Professor and Project Leader at the Information Sciences Institute, University of Southern California. His published work includes an MIT Press book, Ã¢â¬ÅThe Theory and Practice of Discourse Parsing and SummarizationÃ¢â¬Â, and best paper awards, with his ISI colleagues, at AAAI-2000 and ACL-2001 for research on statistical-based summarization and translation. His research has influenced a diverse range of natural language processing fields from discourse parsing to summarization, machine translation, and question answering. His current focus is on efficient learning and decoding/search for statistical machine translation applications.
April 22, 2008
Graham Katz, Georgetown
AbstractExtracting relational information about times and events referred to in a document has a wide range of applications, from information retrieval to document summarization. While there has been a long history of work on temporal interpretation in computational linguistics, this has been primarily in the terms of formal theories of interpretation. The advent of the TimeML language (and the creation of the TIMEBANK resource) has made this area more accessible to empirical methods in NLP and has standardized the task of temporal interpretation. In this paper I will overview the TimeML language, discuss some of its properties, and review the recent TempEval competition. In addition I present three sets of experiments in which we apply machine learning techniques to problem of determining the temporal relations that hold among the events and times in a text.
Speaker BiographyGraham Katz is an assistant professor of computational linguistics at the Linguistics Department of Georgetown University. He got his Ph.D. in Linguistics and Cognitive Science from the University of Rochester and spent a number of years as a researcher and lecturer in Germany, at the University of Tuebingen, Stuttgart and Osnabrueck. Dr. Katz's research area is computational and theoretical semantics, with a focus on issues in temporal interpretation.
April 29, 2008
Liang Huang, University of Pennsylvania
AbstractMany problems in Natural Language Processing (NLP) involves an efficient search for the best derivation over (exponentially) many candidates, especially in parsing and machine translation. In these cases, the concept of "packed forest" provides a compact representation of the huge search spaces, where efficient inference algorithms based on Dynamic Programming (DP) are possible. In this talk we address two important problems within this framework: exact k-best inference which is widely used in NLP pipelines such as parse reranking and MT rescoring, and approximate inference when the search space is too big for exact search. We first present a series of fast and exact k-best algorithms on forests, which are orders of magnitudes faster than previously used methods on state-of-the-art parsers such as Collins (1999). We then extend these algorithms for approximate search when the forests are too big for exact inference. We will discuss two particular instances of this new method, forest rescoring for MT decoding with integrated language models, and forest reranking for discriminative parsing. In the former, our methods perform orders of magnitudes faster than conventional beam search on both state-of-the-art phrase-based and syntax-based systems, with the same level of search error or translation quality. In the latter, faster search also leads to better learning, where our approximate decoding makes whole-Treebank discriminative training practical and results in the best accuracy to date for parsers trained on the Treebank. This talk includes joint work with David Chiang (USC Information Sciences Institute).
Speaker BiographyLiang Huang will shortly finish his PhD at Penn, and is looking for a postdoctoral position.
May 6, 2008
Chris Bartels, University of Washington
AbstractDynamic Bayesian networks (DBNs) are a class of directed graphical models for use on variable length sequences. DBNs have been applied to a number of tasks including automatic speech recognition, language processing, and DNA trace alignment. This talk will begin with a description of my recent work on reducing errors from burst noise in speech recognition using a DBN that combines a conventional phone-based speech recognizer with a classifier that detects syllable locations. The second portion of the talk will introduce several innovations for reducing the computational requirements of probabilistic inference on these types of models.
Speaker BiographyChris Bartels is a Ph.D. candidate in the Department of Electrical Engineering at the University of Washington. He received his M.S. degree from the University of Washington in 2004 and his B.S. degree in computer engineering from the University of Kansas in 1999. Prior to his graduate studies he developed embedded software for GPS and sonar systems at GARMIN International. His research interests include graphical models in automatic speech recognition and inference in graphical models.
July 16, 2008
Yann LeCun, Computational and Biological Learning Lab, Courant Institute of Mathematical Sciences, New York University
AbstractA long-term goal of Machine Learning research is to solve highy complex "intelligent" tasks, such as visual perception, auditory perception, and language understanding. To reach that goal, the ML community must solve two problems: the Partition Function Problem, and the Deep Learning Problem. The normalization problem is related to the difficulty of training probabilistic models over large spaces while keeping them properly normalized. In recent years, the ML and Natural Language communities have devoted considerable efforts to circumventing this problem by developing "un-normalized'' learning models for tasks in which the output is highly structured (e.g. English sentences). This class of models was in fact originally developed during the early 90's in the speech and handwriting recognition communities, and resulted in highly successful commercial system for automatically reading bank checks and other documents. The Deep Learning Problem is related to the issue of training all the levels of a recognition system (e.g. segmentation, feature extraction, recognition, etc) in an integrated fashion. We first consider "traditional'' methods for deep supervised learning, such as multi-layer neural networks and convolutional networks, a learning architecture for image recognition loosely modeled after the architecture of the visual cortex. Several practical applications of convolutional nets will be demonstrated with videos and live demos, including a handwriting recognition system, a real-time human face detector that also estimates the pose of the face, a real-time system that can detect and recognize objects such as airplanes, cars, animals and people in images, and a vision-based navigation system for off-road mobile robots that trains itself on-line to avoid obstacles. Although these methods produce excellent performance, they require many training samples. The next challenge is to devise unsupervised learning methods for deep networks. Inspired by some recent work by Hinton on "deep belief networks", we devised energy-based unsupervised algorithms that can learn deep hierarchies of invariant features for image recognition. We how such algorithms can dramatically reduces the required number of training samples, particularly for such tasks as the recognition of everyday objects at the category level.
Speaker BiographyYann LeCun received an Electrical Engineer Diploma from Ecole SupÃÂ©rieure d'IngÃÂ©nieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Compuer Science from UniversitÃÂ© Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined the Adaptive Systems Research Department at AT&T Bell Laboratories in Holmdel, NJ, in 1988. Following AT&T's split with Lucent Technologies in 1996, he joined at AT&T Labs-Research as head of the Image Processing Research Department. In 2002 he became a Fellow at the NEC Research Institute in Princeton. He has been a professor of computer science at NYU's Courant Institute of Mathematical Sciences since 2003. Yann's research interests include computational and biological models of learning and perception, computer vision, mobile robotics, data compression, digital libraries, and the physical basis of computation. He has published over 130 papers in these areas. His image compression technology, called DjVu, is used by numerous digital libraries and publishers to distribute scanned documents on-line, and his handwriting recognition technology is used to process a large percentage of bank checks in the US. He has been general chair of the annual "Learning at Snowbird" workshop since 1997, and program chair of CVPR 2006.
July 22, 2008
Gary Marcus, New York University
AbstractIn fields ranging from reasoning to linguistics, the idea of humans as perfect, rational, optimal creatures is making a comeback - but should it be? Hamlet's musings that the mind was "noble in reason ...infinite in faculty" have their counterparts in recent scholarly claims that the mind consists of an "accumulation of superlatively well- engineered designs" shaped by the process of natural selection (Tooby and Cosmides, 1995), and the 2006 suggestions of Bayesian cognitive scientists Chater, Tenenbaum and Yuille that "it seems increasingly plausible that human cognition may be explicable in rational probabilistic terms and that, in core domains, human cognition approaches an optimal level of performance", as well as in Chomsky's recent suggestions that language is close "to what some super-engineer would construct, given the conditions that the language faculty must satisfy". In this talk, I will I argue that this resurgent enthusiasm for rationality (in cognition) and optimality (in language) is misplaced, and that the assumption that evolution tends creatures towards "superlative adaptation" ought to be considerably tempered by recognition of what Stephen Jay Gould called "remnants of history", or what I call evolutionary inertia. The thrust of my argument is that the mind in general, and language in particular, might be better seen as what engineers call a kluge: clumsy and inelegant, yet remarkably effective.
July 30, 2008
Barbara Shinn-Cunningham, Boston University
AbstractIn most social settings, competing speech sounds mask one another, causing us to hear only portions of the signal we are trying to understand. Moreover, multiple signals vie for our attention, causing central interference that can also limit what we perceive. Despite such interruptions and interference, we are incredibly adept at communicating in everyday settings. This talk will review recent studies of how we it is that we manage to selectively attend to and understand speech despite interruptions and perceptual competition from other sources. Evidence supports the idea that selective attention depends on the formation of auditory objects, and that the processes of forming and attending to objects evolve over time. In addition, top-down knowledge is critical for enabling us to fill in missing information in what we are successful at attending in everyday settings. These results have important implications for listeners with hearing impairment or who are aging, who are likely to experience difficulties with selectively attending in complex settings.
Speaker BiographyBarbara Shinn-Cunningham received her training in electrical engineering at Brown University (Sc.B., 1986) and the Massachusetts Institute of Technology (M.S., 1989; Ph.D., 1994). She joined the faculty of Boston University (BU) in 1996, where she is Director of Graduate Studies and Associate Professor of Cognitive and Neural Systems. She also holds faculty appointments in BU Biomedical Engineering, the BU Program in Neuroscience, the Harvard/MIT Health Sciences and Technology Program, the Harvard/MIT Speech and Hearing Program, and the Naval Post-Graduate School. She serves on the Governing Board of the Boston University Center for Neuroscience and the Board of Directors for the CELEST NSF Science of Learning Center, as well as various committees of professional organizations such as the Acoustical Society of America, and the Association for Research in Otolaryngology. She has received research fellowships from the Alfred P. Sloan Foundation, the Whitaker Foundation, and the National Security Science and Engineering Faculty Fellows program. Her research includes studies of auditory attention, sound source separation, spatial hearing, and perceptual plasticity.
August 6, 2008
Mark Johnson, Brown University
AbstractNonparametric Bayesian methods are interesting because they may provide a way of learning the appropriate units of generalization as well as the generalization's probability or weight. Adaptor Grammars are a framework for stating a variety of hierarchical nonparametric Bayesian models, where the units of generalization can be viewed as kinds of PCFG rules. This talk describes the mathematical and computational properties of Adaptor Grammars and linguistic applications such as word segmentation and syllabification, and describes the MCMC algorithms we use to sample them. Joint work with Sharon Goldwater and Tom Griffiths.
September 9, 2008
AbstractCoE Quarterly Technical Exchange
September 16, 2008
David Chiang, Information Sciences Institute
AbstractMinimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation (MT) because it has difficulty estimating more than a dozen or two parameters. I will present two classes of features that address deficiencies in the Hiero hierarchical phrase-based translation model but cannot practically be trained using MERT. Instead, we use the MIRA algorithm, introduced by Crammer et al and previously applied to MT by Watanabe et al. Building on their work, we show that by parallel processing and utilizing more of the parse forest, we can obtain results using MIRA that match those of MERT in terms of both translation quality and computational requirements. We then test the method on the new features: first, simultaneously training a large number of Marton and Resnik's soft syntactic constraints, and, second, introducing a novel structural distortion model based on a large number of features. In both cases we obtain significant improvements in translation performance over the baseline. This talk represents joint work with Yuval Marton and Philip Resnik of the University of Maryland.
Speaker BiographyDavid Chiang is a Research Assistant Professor at the University of Southern California and a Computer Scientist at the USC Information Sciences Institute. He received an AB/SM in Computer Science from Harvard University in 1997, and a PhD in Computer and Information Science from the University of Pennsylvania in 2004. After a research fellowship at the University of Maryland Institute for Advanced Computer Studies, he joined the USC Information Sciences Institute in 2006, where he currently works on formal grammars for statistical machine translation.
September 19, 2008
Malcolm Slaney, Yahoo! Research Laboratory
AbstractThe world we live in is not nearly as clean and orderly as the training data sets of yesteryear. Our (acoustic) world is noisy, filled with unknown people and events. Mark Tilden, lead robotic designer for WowWee/Hasbro toys, said last July, "The cocktail party effect is costing me money." In this talk I would like to talk about the need for context and top-down considerations in auditory processing and models of auditory perception. I will demonstrate the need with many examples from visual and auditory perception, and show some directions for future research. I'll conclude with a short discussion of why Yahoo cares (basically because the Internet is full of really noisy data and we want to help people find and understand it.)
Speaker BiographyMalcolm Slaney is a principal scientist at Yahoo! Research Laboratory. He received his PhD from Purdue University for his work on computed imaging. He is a coauthor, with A. C. Kak, of the IEEE book "Principles of Computerized Tomographic Imaging." This book was recently republished by SIAM in their "Classics in Applied Mathematics" Series. He is coeditor, with Steven Greenberg, of the book "Computational Models of Auditory Function." Before Yahoo!, Dr. Slaney has worked at Bell Laboratory, Schlumberger Palo Alto Research, Apple Computer, Interval Research and IBM's Almaden Research Center. He is also a (consulting) Professor at Stanford's CCRMA where he organizes and teaches the Hearing Seminar. His research interests include auditory modeling and perception, multimedia analysis and synthesis, compressed-domain processing, music similarity and audio search, and machine learning. For the last several years he has lead the auditory group at the Telluride Neuromorphic Workshop.
September 23, 2008
Maureen Stone, University of Maryland
AbstractThis talk will review our work using several instrumental techniques that image the tongue. These techniques include ultrasound, cine-MRI, tagged-MRI, and DTI. The tongue is of interest because it is the major articulator in the production of speech; it has the most degrees of freedom. In addition it is an unusual structure as it is composed entirely of soft tissue and must move without benefit of bones or joints. This talk will present an overview of work done by us and our colleagues toward the understanding of tongue motor control, and applications of tongue imaging data to the development of a silent speech interface, a FEM of tongue motion, a study of aging in tongue motion, and a study of tongue motion after removal of cancerous tumors.
Speaker BiographyDr. Maureen Stone measures and models tongue biomechanics and motor control using data from ultrasound and MRI. Dr. Stone is a Professor at the University of Maryland Dental School, and Director of the Vocal Tract Visualization Laboratory. She has written numerous articles on the multi-instrumental approach to studying vocal tract function. She is a Fellow of the Acoustical Society of America.
September 30, 2008
Fernando Pineda, JHU School of Public Health
AbstractStructural ribonucleic acid (RNA) molecules play an important role in regulating gene expression in organisms throughout the tree of life. The number of different classes of structural RNA, their possible mechanisms of action, their interaction partners, etc. are poorly understood. Here we consider the challenging computational problem of ab initio detection of novel structural RNA. Ab initio approaches are especially useful for RNA that is not well conserved across species. We focus on the genome of Plasmodium falciparum, where there is evidence that structural RNA plays a dominant role in regulating gene expression. P. falciparum is an important organism to understand since every year it is responsible for 300-500 million clinical cases of Malaria and around a million deaths, of which over 75% occur in African children under 5 years of age. The genome of an organism codes for its biochemical building blocks as well as its regulatory elements. The "language" used to represent information in the genomic sequence is about as "natural" as it gets and it is not clear what are the appropriate features one should use to detect novel structural RNA. After a brief introduction to the salient biology, we will describe a pragmatic and computationally intensive approach based on methods originally developed by others for detecting structural RNAs in very short viral genomes. We describe a pilot study demonstrating the feasibility of the approach, which also highlighted computational limitations, as well as the fact that the signals are deeply buried in noise. We will describe new algorithms that have allowed us to reduce the computational complexity, and probably increase the signal-to-noise, thereby allowing us to scale up this approach to a truly genome-wide level.
Speaker BiographyDr. Fernando Pineda is Associate Professor of Molecular Microbiology and Immunology at the Johns Hopkins Bloomberg School of Public Health. Where he collaborates with laboratory-based colleagues to model biological systems. He also directs the High Performance Scientific Computing Core. He received his PhD in Theoretical Physics from the University of Maryland, College Park. He has served on the editorial boards of several journals including Neural Computation and IEEE Transactions on Neural Networks. Prior to joining the faculty at the school of Public Health, he was on the Principal Staff at the Johns Hopkins Applied Physics Laboratory. He has also worked at the Jet Propulsion Laboratory and the Harvard-Smithsonian Center for Astrophysics.
October 7, 2008
“Predicting Syntax: Processing Dative Constructions in American and Australian Varieties of English”
Joan Bresnan, Stanford University
AbstractTraditionally, linguistic variation within different time scales has been the province of different disciplines, each with a distinctive suite of techniques for obtaining and analyzing data. For example, historical linguistics, sociolinguistics and corpus linguistics study variation between different speaker groups over historical time and across space, while psycholinguistics, phonetics, and computational speech recognition and synthesis study the dynamics of producing and comprehending language in the individual on a scale of milliseconds. Yet there is evidence that linguistic variation at these different time scales is linked, even in the domain of higher-level syntactic choices. This is a primary finding in the present study of dative constructions, illustrated by (1a,b), in Australian and American English. 1a) Who gave you that wonderful watch? (V NP NP) b) Who gave that wonderful watch to you? (V NP PP) We use a very accurate multilevel probabilistic model of corpus dative productions (Bresnan, Cueni, Nikitina, and Baayen 2007) to measure the predictive capacities of both American and Australian subjects in three pairs of parallel psycholinguistic experiments involving sentence ratings (Bresnan 2007), decision latencies during reading (Ford 1983), and sentence completion. The experimental items were all sampled together with their contexts from the database of corpus datives, stratified by corpus model probabilities. We find that the Australian subjects share with the American subjects a sensitivity to corpus probabilities. But they also show covarying differences, notably a stronger end-weight effect of the recipient in the ratings task and the absence of a dependency-length effect of the theme argument in the decision latency task (cf. Grodner and Gibson 2005). A unifying explanation for these differences is that decision latencies for `to' are reduced and naturalness ratings are increased when a PP is consistent with expectation. The Australian group would then be predicted to have a higher expectation of PP than the US group. This prediction is borne out by the sentence completion tasks, which showed that the Australians produced NP PP completions more than the American subjects in the same contexts. These findings suggest that subtle variations in the experiences of the dative construction by historically and spatially divergent speaker groups can create measurable differences in internalized expectations in individuals at the millisecond level. Bresnan, Joan, Anna Cueni, Tatiana Nikitina, and R. Harald Baayen. 2007. Predicting the dative alternation. In Cognitive Foundations of Interpretation, ed. by G. Boume, I. Kraemer, and J. Zwarts. Amsterdam: Royal Netherlands Academy of Science, pp. 69--94. Bresnan, Joan. 2007. Is syntactic knowledge probabilistic? Experiments with the English dative alternation. In Roots: Linguistics in search of its evidential base. Series: Studies in Generative Grammar_, ed. by Sam Featherston and Wolfgang Sternefeld, pp. 75--96. Berlin: Mouton de Gruyter. Ford. Marilyn. 1983. A method for obtaining measures of local parsing complexity throughout sentences. Journal of Verbal Learning and Verbal Behavior, 22: 203--218.
Speaker BiographyJoan Bresnan, Ph.D. MIT 1972, is Sadie Dernham Patek Professor in Humanities Emerita at Stanford University and a senior researcher at Stanford's Center for the Study of Language and Information, where she has established her Spoken Syntax Lab. A Fellow of the American Academy of Arts and Sciences and a Fellow and past president of the Linguistics Society of America, she is currently PI of the project "The Dynamics of Probabilistic Grammar" funded by the NSF program in Human Social Dynamics.
October 14, 2008
Herb Gish, BBN Technologies
AbstractWe address the problem of performing topic classification of speech when no transcriptions from the speech corpus of interest are available. The approach we take is one of incremental learning about the speech corpus starting with adaptive segmentation of the speech, leading to the generation of discovered acoustic units and a segmental recognizer for these units, and finally to an initial tokenization of the speech for the training of a HMM speech recognizer. The recognizer trained is BBN's Byblos system. We discuss the performance of this system and also consider the case when a small amount of transcribed data is available.
Speaker BiographyDr. Herbert Gish received a Ph.D. in Applied Mathematics from Harvard University in 1967. He is a Principal Scientist at BBN Technologies in Cambridge, Massachusetts in the Speech and Language Processing Department. His most recent work deals with information extraction from speech and text with a focus on problems that have very limited amounts of training data available.
October 21, 2008
Colin Wilson, Johns Hopkins University
AbstractGenerative linguistics studies the variation across languages and the laws (or universals) that limit cross-linguistic variation. The OCP-Place constraint, which is violated by sequences of consonants that have different tokens of the same place of articulation, is a good candidate for a linguistic law (Konstantin & Segerer 2007). However, beginning with the introduction of OCP-Place by McCarthy (1988) (building on observations due to Greenberg 1950) many researchers have claimed that the specific form of the constraint varies considerably across languages. In essence, the purported variation centers on how similar two consonants of the same place must be with respect to other features in order for the constraint to register a violation. I argue in this talk that a single definition of similarity --- the natural classes similarity metric introduced by Frisch et al. (2004) --- is consistent with the effects of OCP-Place in the languages that have been studied, and possibly in all languages. Apparent counterexamples, and particularly the recent case study of Muna (Austronesian) by Coetzee & Pater (2008), are shown to be artifacts of an inconsistent statistical method. A multiplicative, or log-linear, model of constraint interaction is able to maintain a universal formulation of OCP-Place and derive apparent variation from independent constraints.
Speaker BiographyColin Wilson received a Ph.D. in Cognitive Science from Johns Hopkins in 2000. He was a member of the Linguistics department at UCLA from 2000 to 2007, and rejoined the Cognitive Science department as Associate Professor this semester. His most recent work focuses on the typology, gradient interaction, and learning of natural language phonotactics.
October 28, 2008
Abe Ittycheriah, IBM
AbstractI'll present our work at IBM on word-alignment algorithms trained using supervised corpora. Also, I'll demonstrate how improved alignments required changes in machine translation and then present the direct translation model. This work is primarily focused on Arabic to English. I'll review some of the changes since our published papers in both word alignment and machine translation.
Speaker BiographyAbraham Ittycheriah works as a Research Staff Member in the Natural Language System group at the IBM T.J. Watson Research Lab in Yorktown Heights, NY. Over the last four years, his primary focus has been on machine translation and word alignment between Arabic and English. He is also responsible for the Statistical Machine Translation engine used in several government projects. Prior to this assignment, at IBM he has worked on Question Answering and Telephone speech recognition algorithms and interfaces. He obtained his PhD from Rutgers, The State University of New Jersey in 2001.
October 30, 2008
“Active Learning with SVMs for Imbalanced Datasets and a Stopping Criterion Based on Stabilizing Predictions”
AbstractThe use of Active Learning (AL) to reduce NLP annotation costs has recently generated considerable interest. There has also been considerable interest in dealing effectively with the class imbalance that NLP problems so often give rise to. Additionally, the use of Support Vector Machines (SVMs) for NLP has become widespread. After explaining relevant background and motivation, I will discuss how to effectively address class imbalance during AL-SVM (AL with SVMs). In particular, I will discuss how to adapt passive learning techniques in order to effectively use asymmetric costs during AL-SVM. In order to realize the performance gains enabled by a strong AL algorithm, an effective stopping criterion is critical. Therefore, I will also present a new stopping criterion based on stabilizing predictions. An evaluation of the proposed techniques will be reported for several Information Extraction and Text Classification tasks.
Speaker BiographyMichael Bloodgood is a PhD candidate in the Department of Computer and Information Sciences at the University of Delaware. His thesis research deals with Active Learning with Support Vector Machines to reduce NLP annotation costs. More generally, he is interested in reducing training data annotation burdens via active, transfer, semi-supervised, and unsupervised learning techniques. In addition to his thesis work, Michael has worked on anaphora analysis (at U. of Delaware and at Palo Alto Research Center (PARC)), rapidly adapting POS taggers to new domains (at U. of Delaware), and discriminative training for statistical syntax-based machine translation (at USC/ISI). Michael earned his MS in Computer Science from the University of Delaware and a BS in Computer Science and in Information Systems Management from The College of New Jersey.
November 5, 2008
Toby Berger, University of Virginia
AbstractFor a canonical primary cortical neuron which we call N. we introduce a mathematically tractable and neuroscientifically meaningful model of how N stochastically converts the excitation intensities it receives from the union of all the neurons in its afferent cohort into the durations of the intervals between its efferent spikes. We assume that N operates to maximize the ratio of the information that its interspike interval (ISI) durations convey about the history of its afferent excitation intensity per joule of energy N expends to produce and propagate its spikes. We use calculus of variations and Laplace transforms to determine the probability density functions (pdf's) of said excitation intensities and of said ISI durations. The mathematically derived pdf of the ISI durations is in good agreement with experimental observations. Moreover, the derived pdf of the afferent excitation intensity vanishes below a strictly positive level, which also accords with experimental observations. It is felt that our results argue persuasively that primary cortical neurons employ interspike interval codes (i.e., timing codes as opposed to rate oodes).
Speaker BiographyToby Berger was born in New York, NY on September 4, 1940. He received the B.E. degree in electrical engineering from Yale University, New Haven, CT in 1962, and the M.S. and Ph.D. degrees in applied mathematics from Harvard University, Cambridge, MA in 1964 and 1966. From 1962 to 1968 he was a Senior Scientist at Raytheon Company, Wayland, MA, specializing in communication theory, information theory, and coherent signal processing. From 1968 through 2005 he was a faculty member at Cornell University, Ithaca, NY where he held the position of Irwin and Joan Jacobs Professor of Engineering. In 2006 he became a professor in the ECE Deportment of the University of Virginia, Charlottesville, VA. Professor Berger's research interests include information theory, random fields, communication networks, wireless communications, video compression, voice and signature compression and verification, neuroinformation theory, quantum information theory, and coherent signal processing. He is the author of Rate Distortion Theory: A Mathematical Basis for Data Compression and a co-author of Digital Compression for Multimedia: Principles and Standards, and Information Measures for Discrete Random Fields. Berger has served as editor-in-chief of the IEEE Transactions on Information Theory and as president of the IEEE Information Theory Group. He has been a Fellow of the Guggenheim Foundation, the Japan Society for Promotion of Science, the Ministry of Education of the People's Republic of China and the Fulbright Foundation. He received the 1982 Frederick E. Terman Award of the American Society for Engineering Education, the 2002 Shannon Award from the IEEE Information Theory Society and the IEEE 2006 Leon K. Kirchmayer Graduate Teaching Award. Professor Berger is a Fellow and Life Member of the IEEE, a life member of Tau Beta Pi, a member of the National Academy of Engineering, and an avid amateur blues harmonica player.
November 11, 2008
Chris Quirk, Microsoft
AbstractAs we scale statistical machine translation systems to general domain, we face many challenges. This talk outlines two approaches for building better broad-domain systems. First, progress in data-driven translation is limited by the availability of parallel data. A promising strategy for mitigating data scarcity is to mine parallel data from comparable corpora. Although comparable corpora seldom contain parallel sentences, they often contain parallel words or phrases. Recent fragment extraction approaches have shown that including parallel fragments in SMT training data can significantly improve translation quality. We describe efficient and effective generative models for extracting fragments, and demonstrate that these algorithms produce substantial improvements on out-of-domain test data without suffering in-domain degradation. Second, many modern SMT systems are very heavily lexicalized. While such information excels on in-domain test data, quality falls off as the test data broadens. This next section of the talk describes robust generalized models that leverage lexicalization when available, and back off to linguistic generalizations otherwise. Such an approach results in large improvements over baseline phrasal systems when using broad domain test sets.
November 18, 2008
Les Atlas, University of Washington
AbstractBe it in a restaurant or other reverberant and noisy environment, normal hearing listeners segregate multiple sources, usually strongly overlapping in frequency, well beyond capabilities expected by current beamforming approaches. What is it that we can learn from this common observation? As is now commonly accepted, the differing dynamical modulation patterns of the sources are key to these powers of separation. But until recently, the theoretical underpinnings for the notion of dynamical modulation patterns have been lacking. We have taken a previously loosely defined concept, called "modulation frequency analysis," and developed a theory which allows for distortion-free separation (filtering) of multiple sound sources with differing dynamics. A key result is that previous assumptions of non-negative and real modulation are not sufficient and, instead, coherent separation approaches are needed to separate different modulation patterns. These results may have an impact in separation and representation of multiple simultaneous sound streams for speech, audio, hearing loss treatment, and underwater acoustic applications. This research also suggests exciting new and potentially important open theoretical questions for general nonstationary signal representations, extending beyond acoustic applications and potentially impacting other areas of engineering and physics.
Speaker BiographyLes Atlas received his M.S. and Ph.D. degrees in Electrical Engineering from Stanford University in 1979 and 1984, respectively. He joined the University of Washington in 1984, where he is currently a Professor of Electrical Engineering. His research is in digital signal processing, with specializations in acoustic analysis, time-frequency representations, and signal recognition and coding. Professor Atlas received a National Science Foundation Presidential Young Investigator Award and a 2004 Fulbright Senior Research Scholar Award. He was General Chair of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, Chair of the IEEE Signal Processing Society Technical Committee on Theory and Methods, and a member-at-large of the Signal Processing Society's Board of Governors. He is a Fellow of the IEEE "for contributions to time-varying spectral analysis and acoustical signal processing."
November 25, 2008
Fernando Pereira, Google
AbstractOver the last decade, linear models have become the standard machine learning approach for supervised classification, ranking, and structured prediction natural language processing. They can handle very high-dimensional problem representations, they are easy to set up and use, and they extend naturally to complex structured problems. But there is something unsatisfying in this work. The geometric intuitions behind linear models were developed with low-dimensional, continuous problems, while natural language problems involve very high dimension, discrete representations with long tailed distributions. Do the orignal intuitions carry over? In particular, do standard regularization methods make any sense for language problems? I will give recent experimental evidence that there is much to do in making linear model learning more suited to the statistics of language.
December 2, 2008
David Poeppel, University of Maryland