Archived Seminars by Year
January 31, 2007
David Jensen, University of Massachusetts Amherst
AbstractNetworks are ubiquitous in computer science and everyday life. We live embedded in social and professional networks, we communicate through telecommunications and computer networks, and we represent information in documents connected by hyperlinks and bibliographic citations. Only recently, however, have researchers developed techniques to analyze and model data about these networks. These techniques build on work in artificial intelligence, statistics, databases, graph theory, and social network analysis, and they are profoundly expanding the phenomena that we can understand and predict. Emerging applications for these new techniques include citation analysis, web mining, bioinformatics, peer-to-peer networking, computer security, epidemiology, and financial fraud detection. This talk will outline the unifying ideas behind three lines of recent work in my research group: 1) methods for learning joint distributions of variables on networks; 2) methods for navigating networks; and 3) methods for indexing network structure. All these methods share a common thread -- representing and exploiting autocorrelation. Autocorrelation (or homophily) is a common feature of many social networks. Two individuals are more likely to share similar occupations, political beliefs, or cultural backgrounds if they are neighbors. In general, a statistical dependence often exists between the values of the same variable on neighboring entities. Much of the work in my group focuses on relational dependency networks and latent group models, two methods for learning statistical dependencies in social networks. The most important discoveries made using these models are often autocorrelation dependencies. We have also developed expected-value navigation, a method that combines information about autocorrelation and degree structure to efficiently discover short paths in networks. Finally, we have developed network structure indices, a method of annotating networks with artificially created autocorrelated variables to index graph structures so that short paths can be discovered quickly. Network structure indices, in turn, provide several ways to improve our probabilistic modeling, completing a surprising cycle of research unified by the concept of autocorrelation.
Speaker BiographyDavid Jensen is Associate Professor of Computer Science and Director of the Knowledge Discovery Laboratory at the University of Massachusetts Amherst. From 1991 to 1995, he served as an analyst with the Office of Technology Assessment, an agency of the United States Congress. He received his doctorate from Washington University in 1992. His research focuses on machine learning and knowledge discovery in relational data, with applications to web mining, social network analysis, and fraud detection. He serves on the program committees of the International Conference on Knowledge Discovery and Data Mining and the International Conference on Machine Learning. He is a member of the 2006-2007 Defense Science Study Group.
February 6, 2007
Peter Hoff, University of Washington
AbstractRelational data consist of information that is specific to pairs (triples, etc) of objects. Examples include friendships among people, trade between countries, word counts in documents and interactions among proteins. A recent approach to modeling such data is via the use of latent factor models, in which the relationship between two objects is modeled as a function of some unobserved characteristics of the objects. Such a modeling approach is related to random effects modeling and to matrix decomposition techniques, such as the eigenvalue and singular value decompositions. In the context of several data analysis examples, I will describe and motivate this modeling approach, and show how latent factor models can be used for estimation, prediction and visualization for relational data.
Speaker BiographyPeter Hoff is an associate professor in the departments of Statistics and Biostatistics, and a member of the Center for Statistics and the Social Sciences at the University of Washington in Seattle.
February 13, 2007
Bruno Jedynak, Johns Hopkins University (CIS)
AbstractThe heat equation is a partial differential equation which describes the variation of temperature in a given region over time subject to boundary conditions. We will define a related equation that we will also call a heat equation in the situation where the space variable belongs to the vertices of a graph. We will review examples of graphs where the heat equation can be solved analytically. We will then discuss applications in language modeling and in image processing where solving the heat equation on a well chosen graph can lead to interesting Smoothing and denoising algorithms.
February 20, 2007
John Lafferty, Carnegie Mellon University
AbstractWe present new results on sparse estimation in both the parametric setting for graphical models, and in the nonparametric setting for regression in high dimensions. For graphical models, we use l1 regularization to estimate the structure of the underlying graph in the high dimensional setting. In the case of nonparametric regression, we present a method that regularizes the derivatives of an estimator, resulting in a type of nonparametric lasso technique. In addition, we discuss the problem of semi-supervised learning, where unlabeled data is used in an attempt to improve estimation. We analyze some current regularization methods in terms of minimax theory, and develop new methods that lead to improved rates of convergence. Joint work with Han Liu, Pradeep Ravikumar, Martin Wainwright, and Larry Wasserman.
Speaker BiographyJohn Lafferty is a professor in the Computer Science Department and the Machine Learning Department within the School of Computer Science at Carnegie Mellon University. His research interests are in machine learning, statistical learning theory, computational statistics, natural language processing, information theory, and information retrieval. Prof. Lafferty received the Ph.D. in Mathematics from Princeton University, where he was a member of the Program in Applied and Computational Mathematics. Before joining the faculty of CMU, he was a Research Staff Member at the IBM Thomas J. Watson Research Center as a Research Staff Member, working in Frederick Jelinek's group on statistical natural language processing. Prof. Lafferty currently serves as co-Director, with Steve Fienberg, of CMU's Ph.D. Program in Computational and Statistical Learning, and as an associate editor of the Journal of Machine Learning Research. His first glimpse of the power and magic of combining statistics and computation--in the practice of what has come to be called machine learning--was seeing the first decodings emerge from the IBM statistical machine translation system in the late 1980s.
February 27, 2007
Ciprian Chelba, Google Inc.
AbstractEver increasing computing power and connectivity bandwidth together with falling storage costs result in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, search emerges as a key application as more and more data is being saved. Speech search has not received much attention due to the fact that large collections of untranscribed spoken material have not been available, mostly due to storage constraints. As storage becomes cheaper, the availability and usefulness of large collections of spoken documents is limited strictly by the lack of adequate technology to exploit them. Manually transcribing speech is expensive and sometimes outright impossible due to privacy concerns. This leads us to exploring an automatic approach to searching and navigating spoken document collections. The talk will focus on techniques for the indexing and retrieval of spoken audio files, and results on a corpus (MIT iCampus) containing recorded academic lectures.
Speaker BiographyCiprian Chelba is a Research Scientist with Google. Previously he worked as a Researcher in the Speech Technology Group at Microsoft Research. His core research interests are in statistical modeling of natural language and speech. Recent projects include speech content indexing for search in spoken documents, discriminative language modeling for large vocabulary speech recognition, as well as speech and text classification.
March 6, 2007
Joan E. Forester, U.S. Army Research Laboratory
AbstractIntelligence analysts have the arduous responsibility of processing large amounts of data to determine trends and relationships. Analysts must be able to gather traditional information (signal, human, and measurement and signature intelligence) and nontraditional data (financial and social context) to form actionable intelligence. One source of nontraditional data is Web-based news. The U.S. Army Research Laboratory (ARL) currently has two projects that will jointly meet part of this requirement. The first is the Real Time News Analysis (RTNA) project. RTNA is being developed to harvest real time streaming data from Web-based news sources and pre-process it by information extraction, categorization, message understanding, concept mining, and fusing. This data will then be fed to ARL's Social Network Analysis (SNA) project. This is challenging research but with a potential high payoff of providing non-traditional information quickly to analysts.
Speaker BiographyMs. Forester received a Bachelor of Science in Computer and Information Science from Towson University in January 1987, graduating with Summa cum Laude. She did post graduate training at the G.W.C. Whiting School of Engineering, The John Hopkins University, where her major concentration was in artificial intelligence and computer vision. She is currently an operations research analyst (computer scientist) in the Computational & Information Science Directorate, Tactical Collaboration & Data Fusion Branch of the ARL, where is works on projects dealing with real time news analysis and social networking.
March 20, 2007
John Hale, Michigan State University
AbstractThe relationship between grammar and language behavior is not entirely clear-cut. One classic view (Chomsky 65, Bresnan & Kaplan 82, Stabler 83, Steedman 89) holds that grammars specify a time-independent body of knowledge, one that is deployed on-line by a processing mechanism. Determining the computational properties of this mechanism is thus a central problem in cognitive science. This talk demonstrates an analytical approach to this problem that divides the job up into three parts: parser = control * memory * grammar Time-dependent sentence processing predictions then follow mechanically from the conjunction of assumptions about each of the three parts (cf. Kaplan 72). Certain combinations accord with known phenomena and suggest new experimental directions. But more broadly the approach offers an explicit, positive proposal about how human sentence comprehension works and the role grammar plays in it.
Speaker BiographyJohn Hale is a cognitive scientist whose research focuses on computational linguistics. His recent projects have addressed human sentence processing, formal language theory and speech disfluency. He received his PhD from Johns Hopkins in 2003 and holds a joint appointment in Linguistics & Languages and Computer Science & Engineering at Michigan State University.
March 27, 2007
Justin Halberda, Johns Hopkins University
AbstractIn this talk I will bring together two literatures: 1) work on word-learning in young children, and 2) work on the developmental origins of logical reasoning.Ã‚Â The predominant view in each of these literatures has been that word learning is supported by probabilistic (non-deductive) inference mechanisms, and that children display no abstract logical competence until after 5 or more years of age (after the onset of robust language ability).Ã‚Â I will make a case that two-year-old children have access to a particular domain general logical reasoning strategy (Disjunctive Syllogism) and that they bring this strategy to bear on the task of learning new words.Ã‚Â This reveals a logical competence that has not been observed before in young children and it begins to reveal the logical computations that support word learning constraints.
April 17, 2007
Julia Hockenmaier, University of Pennsylvania
AbstractWe know that adult speakers of a language have no problem understanding newspapers in that language, and that proteins fold spontaneously into specific three-dimensional structures. However, a sentence in the Wall Street Journal may have millions of possible grammatical analyses, and a protein may have millions of possible structures. As computer scientists who want to design systems that can either parse natural language or predict the folded structure of proteins, we are faced with two very similar search problems: In both cases, we want to find the optimal structure of an input sequence among an exponential number of possible alternatives. In this talk, I will demonstrate how CKY, a standard dynamic programming algorithm that is normally used in natural language parsing, can be adapted to give us novel insights into the protein folding problem. If we assume that folding is a greedy, hierarchical search for lowest-energy structures, CKY provides an efficient way to find all direct folding routes. I will also show that we can extend CKY to construct a Markov chain model of the entire folding process, and that this Markov chain may explain an apparent contradiction between what experimentalists observe in a test tube and what many theorists predict.
Speaker BiographyJulia Hockenmaier is a postdoc with Aravind Joshi at the University of Pennsylvania, and also a frequent visitor to Ken Dill's lab at the University of California at San Francisco. Her research areas are natural language processing (computational linguistics) and computational biology, specifically natural language parsing and protein folding.
May 1, 2007
Karen Livescu, Massachusetts Institute of Technology
AbstractSpoken language technologies, such as automatic speech recognition and synthesis, typically treat speech as a string of "phones". In contrast, humans produce speech through a complex combination of semi-independent articulatory trajectories. Recent theories of phonology acknowledge this, and treat speech as a combination of multiple streams of linguistic "features". In this talk I will present ways in which the factorization of speech into features can be useful in speech recognition, in both audio and visual (lipreading) settings. The main contribution is a feature-based approach to pronunciation modeling, using dynamic Bayesian networks. In this class of models, the great variety of pronunciations seen in conversational speech is explained as the result of asynchrony among feature streams and changes in individual feature values. I will also discuss the use of linguistic features in observation modeling via feature-specific classifiers. I will describe the application of these ideas in experiments with audio and visual speech recognition, and present analyses suggesting additional potential applications in speech science and technology.
Speaker BiographyKaren Livescu is a Luce Post-doctoral Fellow in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and EECS department at MIT. She completed her PhD in EECS, MIT, in 2005, and her BA in Physics at Princeton University in 1996, with a stint in between as a visiting student in EE/CS at the Technion in Israel. In the summer of 2006 she led a team project in JHU's summer workshop series on speech and language engineering. Her main research interests are in speech and language processing, with a slant toward combining statistical modeling techniques with knowledge from linguistics and speech science.
May 3, 2007
“Should Airplanes Flap Wings?
(Should machine processing of sensory data take inspiration from nature?)”
Hynek Hermansky, IDIAP Research Institute
AbstractNature's sensory systems, their corresponding processing modules, and, in some cases (such as speech) also the structure of message-carrying sensory data, have all co-evolved to ensure survival of their respective species, and hence reached a high level of effectiveness. Therefore, we argue that human-like processing often represents the most effective engineering processing for sensory data. However, we also argue that such human-like processing does not (and perhaps should not) be derived by indiscriminate emulation of all mechanisms and properties of biological systems. Rather, we think that our designs should selectively apply key human-like concepts that address the particular weaknesses of artificial algorithms, and that have not yet fully evolved in the course of the historical evolution of speech technology. We also show that these concepts may sometimes directly emerge in the course of optimizing performance of the machine algorithms on target data. The approach will be illustrated on a several specific examples of algorithms that are currently being successfully used in main stream applications.
Speaker BiographyHynek Hermansky is a Director of Research at the IDIAP Research Institute Martigny and a Professor at the Swiss Federal Institute of Technology at Lausanne, Switzerland (among a number of other mostly unpaid affiliations). He has been working in speech processing for over 30 years, previously as a Research Fellow at the University of Tokyo, a Research Engineer at Panasonic Technologies in Santa Barbara, California, a Senior Member of Research Staff at U S WEST Advanced Technologies, and a Professor and Director of the Center for Information Processing at the OGI School of the Oregon Health and Sciences University, Portland, Oregon.
May 8, 2007
Kevin Knight, USC/Information Sciences Institute
AbstractMachine translation (MT) systems have been getting more accurate. One reason is that machines now gather translation knowledge autonomously, combing through large amounts of human-translated material available on the web. Most of these MT systems learn finite-state Markov models -- target strings are substituted for source strings, followed by local word re-ordering. This kind of model can only support very weak linguistic transformations, and the trained models do not yet lead to reliably high-quality MT. Over the past several years, many new probabilistic tree-based models (versus string-based models) have been designed and tested on many natural language applications, including MT. Such models frequently turn out to be instances of tree transducers, a formal automata model first described by W. Rounds and J. Thatcher in the 1960s and 70s. Tree automata open up new opportunities for us to marry deeper representations, mathematical theory, and machine learning. This talk covers novel algorithms and open problems for tree automata, together with experiments in machine translation.
Speaker BiographyKevin Knight is a Senior Research Scientist and Fellow at USC's Information Sciences Institute, a Research Associate Professor in the Computer Science Department at USC, and co-founder of Language Weaver, Inc. He received his Ph.D. from Carnegie Mellon University in 1991 and his BA from Harvard University in 1986. He is co-author (with Elaine Rich) of the textbook Artificial Intelligence (McGraw-Hill, 1991). His research interests are in statistical natural language processing, machine translation, natural language generation, and decipherment.
May 10, 2007
Bhiksha Raj, MERL Research Lab
AbstractThe magnitude spectrum of any signal may be viewed as a density function or (in the case of discrete frequency spectra) histograms with the frequency axis as the support. In this talk I will describe how this perspective allows us to perform spectral decompositions through a latent-variable model that enables us to extract underlying, or "latent", spectral structures that additively compose the speech spectrum. I show how such decomposition can be used for varied purposes such as bandwidth expansion of narrow-band speech, component separation from mixed monaural signals, and denoising. I then explain how the basic latent-variable model may be extended to derive sparse overcomplete decompositions of speech spectra. I demonstrate through examples that such decompositions can not only be utilized for improved speaker separation from mixed monaural recordings, but also to extract the building blocks of other data such as images and text. Finally, I present shift- and transform-independent extensions of the model, through which it becomes possible to automatically extract repeating themes or objects within data such as audio, images or video.
June 1, 2007
Tobias Scheffer, Machine Learning Research Group of the Max Planck Institute for Computer Science
AbstractMost learning algorithms are constructed under the assumption that the training data is governed by the exact same distribution which the model will later be exposed to. In practice, control over the data generation process is often less perfect. Training data may consist of a benchmark corpus (e.g., the Penn Treebank) that does not reflect the distribution of sentences that a parser will later be used for. Spam filters may be used by individuals whose distribution of inbound emails diverges from the distribution reflected in public training corpora (e.g., the TREC spam corpus). In the talk, I will analyze the problem of learning classifiers that perform well under a test distribution that may differ arbitrarily from the training distribution. I will discuss the correct optimization criterion and a solutions, including a kernel logistic regression classifier for differing training and test challenges. In filtering spam, phishing and virus emails, distributions vary greatly over users, IP domains, and over time. Taking into account that spam senders change their email templates in response to the filtering mechanisms employed, leads to the related but even more challenging problem of adversarial learning.
Speaker BiographyTobias Scheffer is Research Associate Professor and head of the Machine Learning Research Group of the Max Planck Institute for Computer Science. He is an adjunct faculty member of Humboldt-Universitaet zu Berlin. Between 2003 and 2006, he was a Research Assistant Professor at Humboldt-Universitaet zu Berlin. Prior to that, he worked at the University of Magdeburg, at Technische Universitaet Berlin, the University of New South Wales in Sydney and Siemens Corporate Research in Princeton, N.J. He was awarded an Emmy Noether Fellowship of the German Science Foundation DFG in 2003 and an Ernst von Siemens Fellowship by Siemens AG in 1996. He received a Master's Degree in Computer Science (Diplominformatiker) in 1995 and a Ph.D. (Dr. rer nat.) in 1999 from Technische Universitat Berlin. Tobias serves on the Editorial Board of the Data Mining and Knowledge Discovery Journal. He served as Program Chair of the European Conference on Machine Learning, and the European Conference on Principles and Practice of Knowledge Discovery in Databases.
July 25, 2007
Rene Vidal, Johns Hopkins University
AbstractOver the past two decades, we have seen tremendous advances on the simultaneous segmentation and estimation of a collection of models from sample data points, without knowing which points correspond to which model. Most existing segmentation methods treat this problem as "chicken-and-egg", and iterate between model estimation and data segmentation. This lecture will show that for a wide variety of data segmentation problems (e.g. mixtures of subspaces), the "chicken-and-egg" dilemma can be tackled using an algebraic geometric technique called Generalized Principal Component Analysis (GPCA). This technique is a natural extension of classical PCA from one to multiple subspaces. The lecture will touch upon a few motivating applications of GPCA in computer vision, such as image/video segmentation, 3-D motion segmentation or dynamic texture segmentation, but will mainly emphasize the basic theory and algorithmic aspects of GPCA.
Speaker BiographyProfessor Vidal received his B.S. degree in Electrical Engineering (highest honors) from the Pontificia Universidad Catolica de Chile in 1997 and his M.S. and Ph.D. degrees in Electrical Engineering and Computer Sciences from the University of California at Berkeley in 2000 and 2003, respectively. He was a research fellow at the National ICT Australia since September 2003 and joined The Johns Hopkins University in January 2004 as an Assistant Professor in the Department of Biomedical Engineering and the Center for Imaging Science. His areas of research are biomedical imaging (DTI registration and clustering, heart motion analysis), computer vision (segmentation of static and dynamic scenes, multiple view geometry, omnidirectional vision), machine learning (generalized principal component analysis GPCA, kernel GPCA, dynamic GPCA), vision-based coordination and control of unmanned vehicles, and hybrid systems identification and control. Dr. Vidal is recipient of the 2005 NFS CAREER Award and the 2004 Best Paper Award Honorable Mention (with Prof. Yi Ma) for his work on "A Unified Algebraic Approach to 2-D and 3-D Motion Segmentation" presented at the European Conference on Computer Vision. He also received the 2004 Sakrison Memorial Prize for "completing an exceptionally documented piece of research", the 2003 Eli Jury award for "outstanding achievement in the area of Systems, Communications, Control, or Signal Processing", the 2002 Student Continuation Award from NASA Ames, the 1998 Marcos Orrego Puelma Award from the Institute of Engineers of Chile, and the 1997 Award of the School of Engineering of the Pontificia Universidad Catolica de Chile to the best graduating student of the school. He is a program chair for PSIVT 2007 and area chair for CVPR 2005 and ICCV 2007.arning, and the European Conference on Principles and Practice of Knowledge Discovery in Databases.
August 1, 2007
Kevin Cohen, Center for Computational Pharmacology, University of Colorado, School of Medicine
AbstractSoftware testing is a first-class research object in computer science, but so far has not been studied in the context of natural language processing. Testing of language processing applications is qualitatively different from testing other types of applications, because language itself is qualitatively different from other classes of inputs. Nonetheless, a methodology for testing NLP applications already exists. It is theoretically isomorphic with descriptive and structural linguistics, and its praxis is isomorphic with linguistic field methods. In this talk, I present data on the state of software testing for a popular class of text mining application, show the commonalities between software testing and linguistic field methods, and illustrate a number of benefits that accrue from approaching language processing from a software testing perspective in general, and from a descriptive linguistic perspective in particular.
September 18, 2007
Mari Ostendorf, University of Washington
AbstractWith recent advances in automatic speech recognition, there are growing opportunities for natural language processing of speech, including applications such as information extraction, summarization and translation. As speech processing moves from simple word transcription to document processing and analyses of human interactions, it becomes increasingly important to represent structure in spoken language and incorporate structure in performance optimization. In this talk, we consider two types of structure: segmentation and syntax. Virtually all types of language processing technology, having been developed on written text, assumes knowledge of sentence boundaries; hence, sentence segmentation is critical for spoken document processing. Experiments show that sentence segmentation has a significant impact on performance of tasks such as parsing, translation and information extraction. However, optimizing for downstream task performance leads to different operating points for different tasks, which we claim argues for the additional use of subsentence prosodic structure. Parsing itself is an important analysis tool used in many human language technologies, and jointly optimizing speech recognition performance for parse and word error benefits these applications. Moreover, we show that optimizing recognition for parsing performance can benefit subsequent language processing (e.g. translation) even when parse structure is not explicitly used, because of the increased importance placed on constituent headwords. Of course, if parsing is part of the ultimate objective, recognition benefits even more from parsing language models than with simple word error rate criteria. A complication arises in working with conversational speech due to the presence of disfluencies, which reinforces the argument for subsentence prosodic modeling and explicit representation of disfluencies in parsing models.
Speaker BiographyMari Ostendorf received the Ph.D. in electrical engineering from Stanford University in 1985. After working at BBN Laboratories (1985-1986) and Boston University (1987-1999), she joined the University of Washington (UW) in 1999. She has also served as a visiting researcher at the ATR Interpreting Telecommunications Laboratory in Japan in 1995 and at the University of Karlsruhe in 2005-2006. At UW, she is currently an Endowed Professor of System Design Methodologies in Electrical Engineering and an Adjunct Professor in Computer Science and Engineering and in Linguistics. She teaches undergraduate and graduate courses in signal processing and statistical learning, including a project-oriented freshman course that introduces students to signal processing and communications. Prof. Ostendorf's research interests are in dynamic and linguistically-motivated statistical models for speech and language processing. Her work has resulted in over 160 publications and 2 paper awards. Prof. Ostendorf has served on numerous technical and advisory committees, as co-Editor of Computer Speech and Language (1998-2003), and now as the Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing. She is a Fellow of IEEE and a member of ISCA, ACL, SWE and Sigma Xi.
September 27, 2007
Brian Roark, OGI School of Science and Engineering at OHSU
AbstractIn this talk we will present some preliminary experiments on using multi-sequence alignment (MSA) techniques for inducing monolingual finite-state tagging models that capture some global sequence information. Such MSA techniques are popular in bio-sequence processing, where key information about long-distance dependencies and three-dimensional structures of protein or nucleotide sequences can be captured without resorting to polynomial complexity context-free models. In the NLP community, such techniques have been used very little -- most notably for aligning paraphrases (Barzilay and Lee, 2003) -- and not at all for monolingual syntactic processing. We discuss key issues in pursuing this approach: syntactic functional alignment; inducing multi-sequence alignments; and using such alignments in tagging. Experiments are preliminary but promising.
Speaker BiographyBrian Roark is a faculty member in the Center for Spoken Language Understanding (CSLU) and Department of Computer Science and Electrical Engineering (CSEE) of the OGI School of Science and Engineering at OHSU. He was in the Speech Algorithms Department at AT&T Labs from 2001-2004. He finished his Ph.D. in the Department of Cognitive and Linguistic Sciences at Brown University in 2001. At Brown he was part of the Brown Laboratory for Linguistic Information Processing.
October 2, 2007
Julia Hirschberg, Columbia University
AbstractThis talk will discuss production and perception studies of deceptive speech and the acoustic/prosodic and lexical cues associated with deception. Experiments in which we collected a large corpus of deceptive and non-deceptive speech from naive subjects in the laboratory are described, together with perception experiments of this corpus. Features extracted from this corpus have been used in Machine Learning experiments to predict deception with classification accuracy from 64.0- 66.4%, depending upon feature-set and learning algorithm. This performance compares favorably with the performance of human judges on the same data and task, which averaged 58.2%. We also discuss current findings on the role of personality factors in deception detection, speaker-dependent models of deception, and future research. This work was done in collaboration with Frank Enos, Columbia University;Elizabeth Shriberg, Andreas Stolcke, and Martin Graciarena, SRI/ICSI; Stefan Benus, Brown University; and more.
Speaker BiographyJulia Hirschberg is Professor of Computer Science at Columbia University. From 1985-2003 she worked at Bell Labs and AT&T Labs, as member of Technical Staff working on intonation assignment in text-to-speech synthesis and then as Head of the Human Computer Interaction Research Department. Her research focusses on prosody in speech generation and understanding. She currently works on speech summarization, emotional speech, charismatic speech, deceptive speech, and dialogue prosody. Hirschberg was President of the International Speech Communication Association from 2005-2007 and co-editor-in-chief of Speech Communication from 2003-2006. She was editor-in-chief of Computational Linguistics and on the board of the Association for Computational Linguistics from 1993-2003. She has been a fellow of the American Association for Artificial Intelligence since 1994.
October 9, 2007
“New Methods to Capture and Exploit Multiscale Speech Dynamics: From Mathematical Models to Forensic Tools”
Patrick Wolfe, Statistics and Information Sciences Laboratory (SISL), Harvard University
AbstractThe variability inherent in speech waveforms gives rise to powerful temporal and spectral dynamics that evolve across multiple scales, and in this talk we describe new methods to capture and exploit these multiscale dynamics. First we consider the canonical task of formant estimation, formulated as a statistical model-based tracking problem. We extend a recent model of Deng et al. both to account for the uncertainty of speech presence by way of a censored likelihood formulation, as well as to explicitly model formant cross-correlation via a vector autoregression. Our results indicate an improvement of 20-30% relative to benchmark formant analysis tools. In the second part of the talk we present a new adaptive short-time Fourier analysis-synthesis scheme for signal analysis, and demonstrate its efficacy in speech enhancement. While a number of adaptive analyses have previously been proposed to overcome the limitations of fixed time-frequency resolution schemes, we derive here a modified overlap-add procedure that enables efficient resynthesis of the speech waveform. Measurements and listening tests alike indicate the potential of this approach to yield a clear improvement over fixed-resolution enhancement systems currently used in practice.
Speaker BiographyPatrick J. Wolfe is currently Assistant Professor of Electrical Engineering in the School of Engineering and Applied Sciences at Harvard, with appointments in the Department of Statistics and the Harvard-MIT Program in Speech and Hearing Biosciences and Technology. He received a B.S. in Electrical Engineering and a B.Mus. concurrently from the University of Illinois at Urbana-Champaign in 1998, both with honors. He then earned his Ph.D. in Engineering from the University of Cambridge (UK) as an NSF Graduate Research Fellow, working on the application of perceptual criteria to statistical audio signal processing. Prior to founding the Statistics and Information Sciences Laboratory at Harvard in 2004, Professor Wolfe held a Fellowship and College Lectureship jointly in Engineering and Computer Science at New Hall, a University of Cambridge consituent college where he also served as Dean. He has also taught in the Department of Statistical Science at University College, London, and continues to act as a consultant to the professional audio community in government and industry. At Harvard he teaches a variety of courses on advanced topics in inference, information, and statistical signal processing, as well as applied mathematics and statistics at the undergraduate level. In addition to his diverse teaching activities, Professor Wolfe has published in the literatures of engineering, computer science, and statistics, and has received honors from the IEEE, the Acoustical Society of America, and the International Society for Bayesian Analysis. His research group focuses on statistical signal processing for modern high-dimensional data sets such as speech waveforms and color images, and is supported by a number of grants and partnerships, including sponsored projects with NSF, DARPA, and Sony Electronics, Inc. Recent research highlights include a paper award at the 2007 IEEE International Conference on Image Processing for work in color image acquisition, a new approach to speech formant tracking that yields up to 30% improvement relative to benchmark methods, and a set of matrix approximation techniques for spectral methods in machine learning, with error bounds that improve significantly upon known results.
October 16, 2007
Michael Riley, Google
AbstractWe describe OpenFst, an open-source library for weighted finite-state transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twenty-five operations for constructing, combining, optimizing, and searching them. At the shell-command level, there are corresponding transducer file representations and programs that operate on them. OpenFst is designed to be both very efficient in time and space and to scale to very large problems. This library has key applications speech, image, and natural language processing, pattern and string matching, and machine learning. We give an overview of the library, including an outline of some key algorithms, examples of its use, details of its design that allow customizing the labels, states, and weights, and the lazy evaluation of many of its operations. Further information and a download of the OpenFst library can be obtained from the OpenFst web site. Joint work with: Cyril Allauzen, Johan Schalkwyk, Wojtek Skut and Mehryar Mohri.
Speaker BiographyMichael Riley received his B.S., M.S. and Ph.D. in computer science from MIT. He joined Bell Labs in Murray Hill, NJ in 1987 and moved to AT&T Labs in Florham Park, NJ in 1996. He is currently a member of the research staff at Google, Inc. in New York City. His interests include speech and natural language processing, text analysis, information retrieval, and machine learning.
October 23, 2007
Matthias Buch-Kromann, Center for Computational Modelling of Language, Department of Computational Linguistics, Copenhagen Business School
AbstractProbabilistic dependency grammars have played an important role in computational linguistics since they were introduced by Collins (1996) and Eisner (1996). In most computational formulations of dependency grammar, a dependency grammar can be viewed as a projective context-free grammar in which all phrases have a lexical head. However, there are many linguistic phenomena that a context-free dependency grammar cannot properly account for, such as non-projective word order (in topicalizations, scramblings, and extrapositions), secondary dependencies (in complex VPs, control constructions, relative clauses, elliptic coordinations and parasitic gaps), and punctuation (which is highly context-sensitive). In the talk, I will present a generative dependency model that can account for these phenomena and others. Although exact probabilistic parsing is NP-hard in this model, heuristic parsing need not be, and I will briefly describe a family of error-driven incremental parsing algorithms with repair that have time complexity O(n log^k(n)) given realistic assumptions about island constraints. In this parsing framework, the dependency model must assign probabilities to partial dependency analyses. I will show one way of doing this and outline how it introduces the need for adding time-dependence into the model in order to support the left-right incremental processing of the text.
Speaker BiographyMatthias Buch-Kromann is head of the Computational Linguistics Group at the Copenhagen Business School (CBS). He is also a member of the Center for Computational Modelling of Language and the Center for Research in Translation and Translation Technology at CBS. His current research interests include dependency treebanks, probabilistic dependency models of texts and translations, and computational models of human parsing and translation. His dr.ling.merc. dissertation (habilitation) from 2006 proposes a dependency-based model of human parsing and language learning. He has been the driving force behind the 100,000 word Danish Dependency Treebank (used in the CoNLL 2006 shared task) and the Copenhagen Danish-English Parallel Dependency Treebank.
October 30, 2007
Rob Schapire, Princeton
AbstractModeling the geographic distribution of a plant or animal species is a critical problem in conservation biology: to save a threatened species, one first needs to know where it prefers to live, and what its requirements are for survival. From a machine-learning perspective, this is an especially challenging problem in which the learner is presented with no negative examples and often only a tiny number of positive examples. In this talk, I will describe the application of maximum-entropy methods to this problem, a set of decades-old techniques that happen to fit the problem very cleanly and effectively. I will describe a version of maxent that we have shown enjoys strong theoretical performance guarantees that enable it to perform effectively even with a very large number of features. I will also describe some extensive experimental tests of the method, as well as some surprising applications. This talk includes joint work with Miroslav DudÃƒÂk and Steven Phillips.
Speaker BiographyRobert Schapire received his ScB in math and computer science from Brown University in 1986, and his SM (1988) and PhD (1991) from MIT under the supervision of Ronald Rivest. After a short post-doc at Harvard, he joined the technical staff at AT&T Labs (formerly AT&T Bell Laboratories) in 1991 where he remained for eleven years. At the end of 2002, he became a Professor of Computer Science at Princeton University. His awards include the 1991 ACM Doctoral Dissertation Award, the 2003 GÃƒÂ¶del Prize and the 2004 Kanelakkis Theory and Practice Award (both of the last two with Yoav Freund). His main research interest is in theoretical and applied machine learning.
November 6, 2007
Richard Rose, McGill University
AbstractThere are a variety of modeling techniques used in automatic speech recognition that have been developed with the goal of representing potential sources of intrinsic speech variability in a low dimensional subspace. The focus of much of the research in this area has been on "speaker space" based approaches where it is assumed that statistical models for an unknown speaker lie in a space whose basis vectors represent relevant variation among a set of reference speakers. As an alternative to these largely data driven approaches, more structured feature and model representations have been developed that are based on theories of speech production and acoustic phonetics. The performance improvements obtained by speaker space approaches like eigenvoice modeling, cluster adaptive training, and several others have been reported for speaker adaptation in many ASR task domains where only small amounts of adaptation data are available. The potential of systems based on phonological distinctive features has also been demonstrated on far more constrained task domains. This talk presents discussion and experimental results that attempt to explore the potential advantages of both classes of techniques. We will also focus on the limitations of these techniques in addressing some of the basic problems that still exist in state of the art ASR systems.
Speaker BiographyRichard Rose was a member of the Speech Systems Technology Group at MIT Lincoln Laboratory working on speech recognition and speaker recognition from 1988 to 1992. He was with AT&T from 1992 to 2003, specifically in the Speech and Image Processing Services Laboratory at AT&T Labs Ã¢â‚¬â€œ Research in Florham Park, NJ after 1996. Ã‚Â Currently, he is an associate professor of Electrical and Computer Engineering at McGill University in Montreal, Quebec. Professor Rose served as a member of the IEEE Signal Processing Society (SPS) Technical Committee on Digital Signal Processing, as a member of the (SPS) Board of Governors, as associate editor for the IEEE Transactions on Speech and Audio Processing, as a member of the IEEE SPS Speech Technical Committee, on the editorial board for the Speech Communication Journal, and was founding editor of the STC Newsletter. He was also recently the Co-chair of the IEEE 2005 Workshop on Automatic Speech Recognition and Understanding.
November 13, 2007
Jerry Hobbs, University of Southern California
AbstractIn this talk I will examine problems encountered in coming to some kind of understanding of one sonnet by Shakespeare (his 64th), ask what it would take to solve these problems computationally, and suggests routes to the solution. The general conclusion is that we are closer to this goal as one might think. Or are we?
November 27, 2007
Tommi Jaakkola, MIT
AbstractMost engineering and science problems involve modeling. We need inference calculations to draw predictions from the models or to estimate them from available measurements. In many cases the inference calculations can be done only approximately as in decoding, sensor networks, or in modeling biological systems. At the core, inference tasks tie together three types of problems: counting (partition function), geometry (valid marginals), and uncertainty (entropy). Most approximate inference methods can be viewed as different ways of simplifying this three-way combination. Much of recent effort has been spent on developing and understanding distributed approximation algorithms that reduce to local operations in an effort to solve a global problem. In this talk I will provide an optimization view of approximate inference algorithms, exemplify recent advances, and outline some of the many open problems and connections that are emerging due to modern applications.
Speaker BiographyTommi Jaakkola received the M.Sc. degree in theoretical physics from Helsinki University of Technology, Finland, and Ph.D. from MIT in computational neuroscience. Following a postdoctoral position in computational molecular biology (DOE/Sloan fellow, UCSC) he joined the MIT EECS faculty 1998. He received the Sloan Research Fellowship 2002. His research interests include many aspects of machine learning, statistical inference and estimation, and algorithms for various modern estimation problems such as those involving multiple predominantly incomplete data sources. His applied research focuses on problems in computational biology such as transcriptional regulation.
December 4, 2007
Lillian Lee, Cornell