Archived Seminars by Year
January 29, 2002
Sam Gustman, Survivors of the Shoah Visual History Foundation
AbstractIn 1994, after filming Schindler's List, Steven Spielberg established Survivors of the Shoah Visual History Foundation with an urgent mission: to videotape and preserve the testimonies of Holocaust survivors and witnesses. Today, the Shoah Foundation has collected more than 50,000 eyewitness testimonies in 57 countries and 32 languages, and is committed to ensuring the broad and effective educational use of its archive worldwide. The Shoah Foundation has built a 180 Tera-Byte database from the digitized testimonies and is in the process of cataloging those testimonies using analysts with historical and political science backgrounds. Technologies for disseminating the archive and automating some of the manual processes involved in cataloging the testimonies are a few of the Shoah Foundation's current efforts and the topic of this talk.
Speaker BiographyBiographical information coming soon.
February 5, 2002
Charles Yang, Yale University - Dept. of Linguistics
AbstractIt is often claimed that irregular verbs in English are learned by memorizing associated pairs between stems and past tense forms, and hence, that frequency of an irregular verb largely determines the success of its acquisition (Pinker 1999). Yet a careful examination of the acquisition data (Marcus, Pinker, Ullman, Hollander, & Xu 1992) shows that the frequency-acquisition correlation completely breaks down when the phonological regularities in irregular past tense formation are taken into consideration. The data in fact suggest a view of learning that involves (a) the construction of phonological rules, even among the very unsystematic irregular classes, and (b) probabilistic associations between words and their corresponding rules (e.g., lose -> (-t suffixation + vowel shorteing). This talk gives acquisition evidence for this approach. Then, based on a model of word learning by Sussman & Yip (1996, 1997), we develop a computational model of sound change, which may explain, inter alia, why irreguliarty in languages is not an "imperfection", but a necessity.
Speaker BiographyCharles Yang received his Ph.D. in computer science from MIT, and has since been teaching computational linguistics and child language at Yale. He is the author of "Knowledge and Learning in Natural Language" (Oxford University Press, 2002).
February 12, 2002
Malcah Yaeger-Dror, University of Arizona, Cognitive Science
AbstractSpeech technology (including synthesis, speech recognition and speaker verification) has made significant advances in recent years in laboratories and in field applications, but speech recognition can still degrade when the test data do not match the training data well -- for example, when the test data includes dialects that are not included in the original sample, or when the speech collected from certain speakers does not match the way they normally speak. As a result, 'non-mainstream dialects' are under-represented because they are more difficult to collect using 'standard' channels.? For those who speak dialects not represented in the training data, this is a serious impediment to the goal of universal access. Such an impediment can have broad-reaching consequences since it can affect access to education and even telephone information systems. In fact, appropriate corpora for developing more adequate recognition strategies are so sparse that it is difficult even to be able to assess just how bad the current situation is, or to evaluate new modeling techniques. One impediment to devising a corpus which permits a better modeling of dialect is the fact that those who understand dialect and 'style' differences are often not versed in speech technology and vice versa. This paper will address the issue of how better understanding between these groups can permit researchers to gather appropriate speech so adequate recognition strategies can be devised for all speakers of English - both by choosing speakers from a broader range of dialects and by collecting the speech in a setting which is appropriate. After a short discussion of 'dialect' and 'style' (Eckert and Rickford 2001, Yaeger-Dror & Hall-Lew 2000(A)), the paper will propose how to take better advantage of a corpus which is already available, and which meets the criteria which appear to be needed for better dialect modeling. The paper will propose that if appropriately labeled and coded with respect to dialect and demographic variables, at least one corpus presently available could be quite helpful in improving dialect recognition.? A subset of the phone calls from Call Friend Southern American English appear to meet these criteria, both because the speech style is natural and conversational and because the speakers represent non-mainstream dialects for which there is at present very inadequate recognition.? We will conclude that better modeling of dialect effects across age groups, dialect groups, and sex should greatly enhance the goal of universal speech access. References Eckert, P. and J. Rickford? 2001. Style and Sociolinguistic Variation. CUP. Yaeger-Dror, M. 2001. Primitives for the analysis of 'style'. In P.Eckert and J.Rickford, 170-185. Lauren Hall-Lew 2000. Prosodic prominence on negation in various registers of US English. Journal of the Acoustical Society of America. 108:2468 (A).
Speaker BiographyBiographical information is coming soon. For more information, please see this webpage.
February 19, 2002
Dr. Peder A. Olsen, IBM T.J. Watson Research Center
AbstractThis talk introduces a new covariance modeling technique for Gaussian Mixture Models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis. A generalized EM algorithm is proposed to obtain maximum likelihood parameter estimates for the basis set and the corresponding expansion coefficients for the precision matrices of individual Gaussians. This model, called the Extended Maximum Likelihood Linear Transform (EMLLT) model, is extremely flexible: by varying the number of basis elements from d to d(d+1)/2 one gradually moves from a Maximum Likelihood Linear Transform (MLLT) model (also known as semi-tied covariance) to a full-covariance model. Experimental results on two speech recognition tasks show that the EMLLT model can give relative gains of up to 35% in the word error rate over a standard diagonal covariance model, 30% over a standard MLLT model.
Speaker BiographyBiographical information is coming soon. For more information, please see this webpage.
February 28, 2002
Dan Ellis, Laboratory for Recognition and Organization of Speech and Audio (LabROSA) at Columbia
AbstractBiographical info coming soon. For more information, please see this webpage.
March 1, 2002
Ralph Grishman, New York University
AbstractEvent extraction involves automatically finding, within a text, instances of a specified type of event, and filling a data base with information about the participants and circumstances (date, place) of the event. These data bases can provide an alternative, to traditional text search engines for repeated, focused searches on a single topic. Constructing an extraction system for a new event type requires identifying the linguistic patterns and classes of words which express the event. We consider the types of knowledge required and how this knowledge can be learned from text corpora with minimal supervision.
Speaker BiographyBiographical info coming soon. For more information, please see this webpage.
March 12, 2002
Hynek Hermansky, Oregon Graduate Institute
AbstractThe talk describes our work towards data-driven features that could be used with the current HMM system and that would represent transformed posterior probabilities of the sub-word classes. To address steady or slowly-varying artifacts, the probabilities are derived from relatively long time spans of the signal (up to 1 sec). This may also alleviate some dependencies on the phonetic context. To address excessive sensitivity of ASR to changes in short-term spectral profiles, we do the probability estimations in two steps. The first step yields frequency-localized class probability estimates. These estimates are used as inputs to another probability estimator that yields the final class probabilities. These final probabilities are appropriately transformed to yield features for the subsequent HMM classifier. The whole feature module is trained on labeled speech data.
Speaker BiographyHynek Hermansky is a Professor of Electrical and Computer Engineering and Director of Center for Information Technology at the OGI School of Oregon Health and Sciences University in Portland, Oregon, and a Senior Research Scientist at the International Computer Science Institute in Berkeley, California. He has been working in speech processing for over 25 years, previously as a research fellow at the University of Tokyo, a Research Engineer at Panasonic Technologies in Santa Barbara, California, and as a Senior Member of Research Staff at U S WEST Advanced Technologies. He is a Fellow of IEEE, Member of the Board of the International Speech Communication Association, Editor of IEEE Transactions on Speech and Audio Processing, and a Member of the Editorial Board of Speech Communication. He holds Dr.Eng. degree from the University of Tokyo. His main research interests are in acoustic processing for speech and speaker recognition.
April 2, 2002
Mark Liberman, Linguistic Data Consortium, University of Pennsylvania
AbstractWhen linguists, psychologists or engineers try to understand, explain or imitate human speech and language, they usually do so by modeling individual speakers, hearers or learners. Nevertheless, language is an emergent property of groups (of humans), and elementary arguments suggest that non-trivial characteristics of speech and language emerge from interactions within groups of individuals over time. We should also expect that we need to look at how variable inherited traits affect such socially-emergent properties, in order to understand the evolved genetic influences on speech and language. After an obligatory but brief discussion of insect communication, this talk will explore the application of these ideas to (pathetically simple) models in two areas: morphosyntactic regularization and categorical perception.
Speaker BiographyBiographical information coming soon.
April 9, 2002
Gertjan van Noord, University of Groningen
AbstractAlpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. Alpino is based on a constructionalist HPSG grammar including a large lexical component. Alpino produces dependency structures, as proposed in the CGN (Corpus of Spoken Dutch). Important aspects of wide-coverage parsing are robustness, efficiency and disambiguation. In the talk we briefly introduce the Alpino system, and then discuss two recent developments. The first development is the integration of a log-linear model for disambiguation. It is shown that this model performs well on the task, despite the small size of the training data that is used to train the model. We also describe how we avoid the inherent efficiency problems of using such a log-linear model in parse selection. The second development concerns the implementation of an unsupervised POS-tagger. It is shown that a simple POS-tagger can be used to filter the results of lexical analysis of a wide-coverage computational grammar. The reduction of the number of lexical categories not only greatly improves parsing efficiency, but in our experiments also gave rise to a mild increase in parsing accuracy; in contrast to results reported in earlier work on supervised tagging. The novel aspect of our approach is that the POS-tagger does not require any human-annotated data - but rather uses the parser output obtained on a large training set.
Speaker BiographyBiographical info coming soon. For more information, please see this webpage.
April 16, 2002
Ed Stabler, UCLA Dept. of Linguistics
AbstractThe human acquisition of human languages is is based on the analysis of signal and context. To study how this might work, a simplified robotic setting is described in which the problem is divided into two basic steps: an analysis of the linguistic events in context that yields dependency structures, and the identification of grammars that generate those structures. A learnability result that generalizes (Kanazawa 1994) has been obtained, showing that non-CF, and even non-TAG languages can be identified in this setting, and more realistic assessments of the learning problem are under study.
Speaker BiographyStabler is Professor of Linguistics at UCLA. He specializes in theories of human language processing and formal learnability theory, with interests in automated theorem proving, philosophy of logic and language, and artificial life.
April 23, 2002
Rahul Sarpeshkar, Massachusetts Institute of Technology
AbstractThe silicon cochlea implements the biophysics of the human cochlea on an analog electronic chip. I shall demonstrate the operation of a 61dB, 0.5mW analog VLSI silicon cochlea. An engineering analysis of this cochlea suggests why?the ear is designed as a distributed traveling wave amplifier rather than as a bank of bandpass filters: Such an architecture is a very efficient way of implementing a high resolution, high filter order, wide dynamic range frequency analyzer.?? I shall outline work on constructing low-power cochlear-implant processors that are based on circuits in the silicon cochlea as well as work on constructing a distributed-gain-control silicon-cochlea-based cochlear-implant processor. These processors have promise for cutting power dissipation by more than an order of magnitude in today's implant processors?and for improving patient performance in noise.
Speaker BiographyBiographical info coming soon.
April 30, 2002
David Mumford, Brown University
AbstractLike pure mathematics, applied mathematics thrives on unexpected links between subfields. The stochastic modeling of perception has created the need for new types of probability models of the patterns of the world. One of these is "shape", the illusive quality that tells, for instance, what is a dog, what is a cat. We will review various models to capture these patterns, esp. the wonderful link, based on the work of Arnold and forged by Grenander and Miller, with the Euler equation.
Speaker BiographyInformation about David Mumford is available at his website.
July 19, 2002
July 31, 2002
“Facing the Curse of Dimensionality in Statistical Language Modeling using Distributed Representations”
Speaker Biography<iframe src="http://player.vimeo.com/video/59310531" width="500" height="367" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>
August 8, 2002
“The Manifold Advantages of Articulatory Representations, Including Microphone and Speaker Normalization”
August 14, 2002
September 12, 2002
James West, The Johns Hopkins University
AbstractIt is well known that condenser microphones are the transducer of choice when accuracy, stability, frequency characteristics, dynamic range, and phase are important. But conventional condenser microphones require critical and costly construction as well as the need for a high DC bias for linearity. These disadvantages ruled out practical microphone designs such as multi- element arrays and the use of linear microphones in telephony. The combination of our discovery of stable charge storage in thin polymers and the need for improved linearity in communications encouraged the development of modern electret microphones. Modern polymer electret transducers can be constructed in various sizes and shapes mainly because it is a simple inexpensive transducer. Applications of electret microphones range from very small hearing aid microphones to very large single element units for underwater and airborne reception of very low frequencies. Because the frequency and phase response of electret microphones are relatively constant from unit to unit, multiple element two-dimensional arrays have been constructed using 400 electret elements that cost about $1.00 each. The Internet Protocol (IP) offers the bandwidth needed to further improve audio quality for telephony, but this will require broadband microphones and loudspeakers to provide customers with voice presence and clarity. Directional microphones for both hand-held and hands free modes are necessary to improve signal-to-noise ratios and to enable automatic speech recognition. Arrays with dynamic beam forming properties are also necessary for large conference rooms. Signal processing has made possible stereo acoustic echo cancellers and many other signal enhancements that improve audio quality. I will discuss some of the current work on broadband communications at Avaya Labs.
September 14, 2002
Michelle Efros, CalTech
September 24, 2002
Christopher Manning, Stanford University
AbstractProbabilistic parsing methods have in recent years transformed our ability to robustly find correct parses for open domain sentences. But people normally still think of parsers in terms of logical presentations via the notion of "parsing as deduction". I will instead connect stochastic parsing with finding shortest paths in hypergraphs, and show how this approach naturally provides a chart parser for arbitrary probabilistic context-free grammars (finding shortest paths in a hypergraph is easy; the central problem of parsing is that the hypergraph has to be constructed on the fly). Running such a parser exhaustively, I will briefly consider the properties of the Penn Treebank (the most used hand-parsed corpus): the vast parsing ambiguity that results from these properties and how simple models can accurately predict the amount of work a parser does on this corpus. Using the hypergraphical viewpoint, a natural approach is to use the A* algorithm to cut down the work in finding the best parse. On unlexicalized grammars, this can reduce the parsing work done dramatically, by at least 97%. This approach is competitive with methods standardly used in statistical parsers, while ensuring optimality, unlike most heuristic approaches to best-first parsing. Finally, I will present a novel modular generative model in which semantic (lexical dependency) and syntactic structures are scored separately. This factored model is conceptually simple, linguistically interesting, and provides straightforward opportunities for separately improving the component models. Further, it provides a level of performance close to that of similar, non-factored models. And most importantly, unlike other modern parsing models, the factored model permits the continued use of an extremely effective A* algorithm, which makes efficient, exact inference feasible. This is joint work with Dan Klein.
Speaker BiographyChristopher Manning is an Assistant Professor of Computer Science and Linguistics at Stanford University. He received his Ph.D. from Stanford University in 1995, and served on the faculty of the Computational Linguistics Program at Carnegie Mellon University (1994-1996) and the University of Sydney Linguistics Department (1996-1999) before returning to Stanford. His research interests include probabilistic models of language, natural language parsing, constraint-based linguistic theories, syntactic typology, information extraction and text mining, and computational lexicography. He is the author of three books, including Foundations of Statistical Natural Language Processing (MIT Press, 1999, with Hinrich Schuetze).
October 1, 2002
Christoph Tillman, IBM T.J. Watson Research Center
AbstractThis talk is about the use of dynamic programming (DP) techniques for statistical machine translation (SMT). I will present a search procedure for SMT based on dynamic programming. The starting point is a DP solution to the traveling salesman problem. For SMT, the cities correspond to source sentence positions to be translated. Imposing restrictions on the order in which the source positions are translated yields a DP algorithm to carry out the word re-ordering efficiently. A simple data-driven search organization allows to prune unlikely translation hypotheses. Furthermore, I will sketch a DP-based segmentation procedure for SMT. The units of segmentation are blocks - pairs of source and target clumps. Here, the segmentation problem is related to the set cover problem and an efficient DP segmentation algorithm exists if the blocks are restricted by an underlying word-to-word alignment.
Speaker BiographyChristoph Tillmann is a Research Staff Member at the IBM T.J. Watson Research Center. He received his Dipl. degree in computer science in 1996 and his Dr. degree in computer science in 2001, both from Aachen University of Technology (RWTH), Germany. Currently, he is working on statistical machine translation. His research interests include probabilistic language modeling and probabilistic parsing.
October 8, 2002
George Zweig, Los Alamos National Laboratory
AbstractTwo contrasting views of cochlear mechanics are compared with each other, and with experiment. The first posits that all qualitative features of the nonlinear cochlear response are those of a simple dynamical system poised at a Hopf bifurcation, the second argues that the cochlear response must be found with 3-D simulations. Hopf bifurcations are explained, and their consequences for cochlear mechanics explored.
October 15, 2002
Xiaoqiang Luo, IBM T.J. Watson Research Center
AbstractThe performance of a statistical parser often improves if trained with more labelled data. But acquiring labelled data is often expensive and labor-intensive. We address this problem by proposing to use data annotated for other purpose. Label information in other domain or corpus provides partial constraints for parsing, therefore EM algorithm can be employed naturally to infer missing information. I will present our results of improving a maximum entropy parser using cross-domain or cross-corpus data.
Speaker BiographyXiaoqiang Luo got his bachelor degree from University of Science and Technology of China in 1990, and Ph.D from Johns Hopkins University in 1999, all in electrical enigeering. From 1998 till now, he has been working at IBM T.J Watson Research Center as a senior software engineer. He was responsible for developing the semantic parser and interpreter used in the IBM DARPA Communicator. His research interests inlcude statistical modeling in natural language processing (NLP), language modeling, speech recognition and spoken dialog system.
October 15, 2002
October 22, 2002
Eric B. Baum, NEC Research
AbstractWe address the problem of how one can reinforcement learn in ultra-complex environments, with huge state spaces, where one must learn to exploit compact structure of the problem domain. The approach proposed is to simulate the evolution of an artificial economy of computer programs. We discuss why imposing two simple principles on the economic structure leads to the evolution of a collection of programs that collaborate, thus autonomously dividing the problem and greatly facilitating solution. We have tested this on three game domains and one real world problem, using two different computational models (post production and S-expression) for a total of about 6 tests. We find empirically that we are able in each case to evolve systems from random computer code to solve hard problems. In particular, our economy has learned to solve all Blocks World problems (in a certain infinite class) (whereas competing methods solve such problems only up to goal stacks of at most 8 blocks); to unscramble about half a randomly scrambled Rubik's cube; to solve several among a collection of commercially sold puzzles; and to learn a focused web crawler that outperformed a Bayesian focused crawler in our experiments. The web crawler is supplied a number of sample pages, evolves an economy of agents that recognize sets of keywords in ancestors of these pages, and then uses this knowledge to efficiently crawl to similar pages on the web. Igor Durdanovic, Erik Kruus, and John Hainsworth contributed to this work.
October 29, 2002
Geoffry Hinton, University of Toronto
AbstractMany researchers have tried to model perception using belief networks based on directed acyclic graphs. The belief network is viewed as a stochastic generative model of the sensory data and perception consists of inferring plausible hidden causes for the observed sensory input. I shall argue that this is approach is probably misguided because of the difficulty of inferring posterior distributions in densely connected belief networks. An alternative approach is to use layers of hidden units whose activities are a deterministic function of the the sensory inputs. The activities of the hidden units provide additive contributions to a global energy, E, and the probability of each sensory datavector is defined to be proportional to exp(-E). The problem of perceptual inference vanishes in deterministic networks, so perception is very fast once the network has been learned. The main difficulty of this approach is that maximum likelihood learning is very inefficient. Maximum likelihood adjusts the parameters to maximize the probability of the observed data given the model, but this requires the derivatives of an intractable normalization term. I shall show how this difficulty can be overcome by using a different objective function for learning. The parameters are adjusted to minimize the extent to which the data distribution is distorted when it is moved towards the distribution that the model believes in. This new objective function makes it possible to learn large energy-based models quickly.
November 5, 2002
Tom Mitchell, Carnegie Mellon University
AbstractThe sciences that study the brain are experiencing a significant revolution, caused mainly by the invention of new instruments for observing and manipulating brain function. For example, function Magnetic Resonance Imaging (fMRI) now provides a safe, non-invasive tool to observe human brain activity, allowing scientists to capture a 3D image of activity across the entire human brain at a spatial resolution of 1mm, once per second. Brain probes now allow direct recording simultaneously from hundreds of individual neurons in laboratory animals as they move about their environment, genetic knock-out experiments allow studying lab mice missing specific neuro-transmitters, and new dyes provide new ways to study neural pathways and neural metabolism. Brain implants now allow tens of thousands of humans to hear for the first time, and the FDA recently approved the first human retinal implants intended to help blind people. The thesis of my talk is that research over the coming decade in the brain sciences will have a significant impact on Artificial Intelligence research, and that AI will have an even more significant impact on studies of the brain. Well examine two distinct ways in which this synergy between AI and brain sciences is already beginning to take shape. First, AI architectures and algorithms for specific tasks are providing a basis for interpreting new data on brain activity in animals in several cases leading to the conclusion that animals may use approaches surprisingly similar to these engineered AI solutions. Second, machine learning methods are providing new ways to discover regularities in the huge volume of new data for example, automatically discovering the spatial-temporal patterns of brain activity associated with reading a confusing sentence, or determining the semantic category of a word.
Speaker BiographyTom M. Mitchell is the Fredkin Professor of Computer Science at Carnegie Mellon University, and Founding Director of CMU's Center for Automated Learning and Discovery, an interdisciplinary research center specializing in statistical machine learning and data mining. He is President of the American Association of Artificial Intelligence (AAAI), author of the textbook "Machine Learning," and a member of the National Research Council, Computer Science and Telecommunications Board. During 1999-2000 he served as Vice President and Chief Scientist at WhizBang! Labs, a company that employs machine learning to extract information from the web. Mitchell's research interest lies in the area of machine learning and data mining. He has developed specific learning algorithms such as inductive inference methods, learning methods that combine data with background knowledge, methods that learn from combinations of labeled and unlabeled training data, and methods for learning probabilistic first-order logic rules from relational data. He has also explored the application of these methods to complex time series data, including studies of pneumonia mortality and C-section risk from time series data in medical records, to studies of brain function from complex functional MRI time series data, to robot learning.
November 12, 2002
John Henderson, MITRE
AbstractArabic speech recognizers are frequently designed to produce output without any short vowels because readers of Arabic do not require the diacritics that indicate short vowels. This design also allows the speech recognizers to utilize the millions of words of available non-diacritized Arabic text for language model training. Unwritten vowels are also left out of the pronunciation models. This forces the acoustic models to capture not only their intended targets, the non-short-vowel phonemes, but also the systematic interference of the unwritten short vowels. I will detail data-driven approaches to Arabic vowel restoration explored during the 2002 Hopkins summer workshop and the effects they have on speech recognition systems for Arabic. Specifically, I will show that an Arabic ASR system that is trained on the output of an automatic vowel restoration system has lower word error rate than an ASR system trained with implicit disregard for the unwritten portions of the words.
Speaker BiographyJohn Henderson received a B.S. in Math/CS from Carnegie Mellon University in 1994, and a PhD from Johns Hopkins University in 2000 where he studied in the Natural Language Processing Laboratory. Since joining MITRE in 1999, he has been working on diverse topics such as designing annotation standards, named entity recognition, combining question-answering system outputs, recognizing variant forms of transliterated names, and out-of-vocabulary word repair for ASR systems. His current research includes machine translation of fixed point concepts such as proper names, times, and uniquely-specified artifacts, evaluation of MT systems, and other topics that lie in the intersections of MT, NLP, and ASR.
November 19, 2002
Remi Zajac, Systran
AbstractBuilding a high-quality general purpose Machine Translation system is still out of reach in the present state of knowledge. MT has been used mostly to understand the content of foreign texts. However, when the style and the domain are restricted, MT can provide useful results if the system is tuned to these texts. This talk will present linguistic, technical as well methodological issues arising in the construction of customized MT systems, and will address the following topics: Notions of lexical and linguistic closure Assessing customization needs Customization of dictionaries and grammars Manual vs. automatic approaches The iterative manual customization process MT evaluation issues An example of a customization project
December 3, 2002
Dana Boatman, Departments of Neurology and Otolaryngology, Johns Hopkins School of Medicine