Archived Seminars by Year
2009
January 27, 2009
“Modeling Bottom-Up and Top-Down Visual Attention in Humans and Monkeys” 
Laurent Itti, Department of Computer Science and Neuroscience Graduate Program, University of Southern California
Abstract
Visual processing of complex natural environments requires animals to combine, in a highly dynamic and adaptive manner, sensory signals that originate from the environment (bottom-up) with behavioral goals and priorities dictated by the task at hand (top-down). Together, bottom-up and top-down influences combine to serve the many tasks which require that we direct attention to the most ''relevant'' entities in our visual environment. While much progress has been made in investigating experimentally how humans and other primates may operate such goal-based attentional selection, very little is understood of the general mathematical principles and neuro-computational architectures that subserve the observed behavior. I will describe recent computational work which attacks the problem of developing models of visual attentional selection that are more flexible and can be strongly modulated by the task at hand. I will back the proposed architectures up by comparing their predictions to behavioral recordings from humans and monkeys. I will show examples of applications of these models to real-world vision challenges, using complex stimuli from television programs or modern immersive video games.Speaker Biography
Dr. Laurent Itti received his M.S. in Electrical Engineering with a specialization in Image Proc-essing from the Ecole Nationale Superieure des Telecommunications, Paris, France, in 1994. He received his Ph.D. in Computation and Neural Systems from the California Institute of Technology, Pasadena, California, in 2000. He has since then been an Assistant (2000-2006) and Associate (2006-present) professor of Computer Science and voting faculty member of the cross-disciplinary Neuroscience Graduate Program at the University of Southern California (USC), Los Angeles, California. Dr. Itti has authored over 90 peer-reviewed publications in journals, books, and top-ranked conferences. Dr. Itti teaches Artificial Intelligence, Brain Theory and Neural Networks, Introduction to Robotics, Visual Processing, Neuroscience Core Course, Neural Basis for Visually Guided Behavior, and Computational Architectures in Biological Vision. Dr. Itti's laboratory comprises 15 students, postdocs and engineers, and is recipient of grants by the National Science Foundation, DARPA, the National Geospatial-Intelligence Agency, the Human Frontier Science Program (HFSP), the Office of Naval Research, the Army Research Office, and the National Institutes of Health. Dr. Itti has been distinguished through a number of awards, including the 2008 Okawa Foundation Research Award, being one of the 16 nationally selected speak-ers at the 2007 National Academy of Engineering's Frontiers of Engineering Symposium, and serving on Program Committees for several conferences by IEEE.February 3, 2009
“When is a Translation not a Translation?”
Martin Kay, Stanford University
Abstract
A translation is generally taken to be a text that expresses the same meaning as another text in a different language. But the products of the best translators reflects a different, if more illusive, goal. I will seek a somewhat more adequate characterization of translation as it is actually practiced and discuss its consequences for machine translation.Speaker Biography
Martin Kay is a professor of linguistics and computer science at Stanford University. For many years, he was also a research fellow at the Xerox Palo Alto Research Center. He made a number of fundamental contributions to computational linguistics, including chart parsing, unification grammar, and applications of finite-state technology, notably in phonology. He has been an intermittent worker on, and skeptical observer of, machine translation since 1958.February 10, 2009
“From Text to Knowledge via Markov Logic”
Pedro Domingos, University of Washington
Abstract
Language understanding is hard because it requires a lot of knowledge. However, the only cost-effective way to acquire a lot of knowledge is by extracting it from text. The best (only?) hope for solving this "chicken and egg" problem is bootstrapping: start with a small knowledge base, use it to process some text, add the extracted knowledge to the KB, process more text, etc. Doing this requires a modeling language that can incorporate noisy knowledge and seamlessly combine it with statistical NLP algorithms. Markov logic accomplishes this by attaching weights to first-order formulas and viewing them as templates for features of Markov random fields. In this talk, I will describe some of the main inference and learning algorithms for Markov logic, and the progress we have made so far in applying them to NLP. For example, we have developed a system for unsupervised coreference resolution that outperforms state-of-the-art supervised ones on MUC and ACE benchmarks.Speaker Biography
Pedro Domingos is Associate Professor of Computer Science and Engineering at the University of Washington. His research interests are in artificial intelligence, machine learning and data mining. He received a PhD in Information and Computer Science from the University of California at Irvine, and is the author or co-author of over 150 technical publications. He is a member of the advisory board of JAIR, a member of the editorial board of the Machine Learning journal, and a co-founder of the International Machine Learning Society. He was program co-chair of KDD-2003, and has served on numerous program committees. He has received several awards, including a Sloan Fellowship, an NSF CAREER Award, a Fulbright Scholarship, an IBM Faculty Award, and best paper awards at KDD-98, KDD-99 and PKDD-2005.February 17, 2009
“Coarse-To-Fine Models for Natural Language Processing” 
Dan Klein, University of California, Berkeley
Abstract
State-of-the-art NLP models are anything but compact. Parsers have huge grammars, machine translation systems have huge transfer tables, and so on across a range of tasks. With such complexity comes two challenges. First, how can we learn highly complex models? Second, how can we efficiently infer optimal structures within them? Hierarchical coarse-to-fine (CTF) methods address both questions. CTF approaches exploit sequences of models which introduce complexity gradually. At the top of the sequence is a trivial model in which learning and inference are both cheap. Each subsequent model refines the previous one, until a final, full-complexity model is reached. Because each refinement introduces only limited complexity, both learning and inference can be done in an incremental fashion. In this talk, I describe several coarse-to-fine NLP systems. In the domain of syntactic parsing, complexity comes from the grammar. I present a latent-variable approach which begins with an X-bar grammar and learns by iteratively splitting grammar symbols. For example, noun phrases might be split into subjects and objects, singular and plural, and so on. This splitting process admits an efficient incremental inference scheme which reduces parsing times by orders of magnitude. I also present a multiscale variant which splits grammar rules rather than grammar symbols. In the multiscale approach, complexity need not be uniform across the entire grammar, providing orders of magnitude of space savings. These approaches produce the best parsing accuracies in a variety of languages, in a fully language-general fashion. In the domain of syntactic machine translation, complexity arises from both the translation model and the language model. In short, there are too many transfer rules and too many target language word types. To manage the translation model, we compute minimizations which drop rules that have high computational cost but low importance. To manage the language model, we translate into target language clusterings of increasing vocabulary size. These approaches give dramatic speed-ups, while actually increasing final translation quality.Speaker Biography
Dan Klein is an assistant professor of computer science at the University of California, Berkeley (PhD Stanford, MSt Oxford, BA Cornell). His research focuses on statistical natural language processing, including unsupervised learning methods, syntactic parsing, information extraction, and machine translation. Academic honors include a Marshall Fellowship, a Microsoft New Faculty Fellowship, the ACM Grace Murray Hopper award, and best paper awards at the ACL, NAACL, and EMNLP conferences.February 24, 2009
“Music Fingerprinting Using Finite State Transducers, a Novel Application of FST's” 
Pedro Moreno, Google
Abstract
Over the last years finite state transducer technology has found its way in many speech applications, from text processing for synthesizers to the core search algorithms used in speech recognition systems. In this talk we present a novel application of finite state transducers and acoustic modeling techniques to the problem of music fingerprinting*. We will show how the power of FST's can be applied to this problem with great results. In the talk I will also give an overview of current activities in google speech team. * This is joint work with Prof. Mehriar Mohri and Eugene Weinstein at NYU.Speaker Biography
Pedro J. Moreno is a research scientist at Google Inc. working in the New York office. His research interests are speech and multimedia indexing and retrieval, speech and speaker recognition and applications of machine learning. He received a Ph.D. in electrical and computer engineering from Carnegie Mellon University.March 3, 2009
“Domain Adaptation in Natural Language Processing”
Hal Daume, University of Utah
Abstract
Supervised learning technology has led to systems for part of speech tagging, parsing, named entity recognition with accuracies in the high 90%s. Unfortunately, the performance of these systems degrades drastically when they are applied on text outside their training domain (typically, newswire). Machine translation systems work fantastically for translating Parliamentary proceedings, but fall down when applied to alternate domains. I'll discuss research that aims to understand what goes wrong when models are applied outside their domain, and some (partial) solutions to this problem. I'll focus on named entity recognition and machine translation tasks, where we'll see a range of different sources of error (some of which are quite counter-intuitive!).Speaker Biography
Hal Daume is an assistant professor in the School of Computing at the University of Utah. His primary research interests are in Bayesian learning, structured prediction and domain adaptation (with a focus on problems in language and biology). He earned his PhD at the University of Southern Californian with a thesis on structured prediction for language (his advisor was Daniel Marcu). He spent the summer of 2003 working with Eric Brill in the machine learning and applied statistics group at Microsoft Research. Prior to that, he studied math (mostly logic) at Carnegie Mellon University. He still likes math and doesn't like to use C (instead he uses O'Caml or Haskell). He doesn't like shoes, but does like activities that are hard on your feet: skiing, badminton, Aikido and rock climbing.March 10, 2009
“EBW as a General, Consistent Framework for Parameter Estimation” 
Dimitri Kanevsky, IBM T.J.Watson Research Center
Abstract
Several optimization techniques are vastly used today in the speech and language community for estimating model parameters. The Extended Baum-Welch (EBW) is one such technique that is extensively used for estimating the parameters of Gaussian mixture models based on a discriminative criteria (like Maximum Mutual Information). In this talk, we present EBW as a consistent, theoretical framework for parameter estimation and show how other common parameter estimation techniques (for example, based on Constrained Line Search) belong to this family of model update rules. We introduce a general family of parameter updates that generalizes a Baum-Welch recursive process to an arbitrary objective function of Gaussian Mixture Models or Poisson Processes. In the second part of this talk we introduce an extension of the EBW for estimating sparse signals from a sequence of noisy observations. As part of this, the underlining EBW algorithms are compared with recently introduced Kalman filtering-based compressed sensing methods. This is joint work with Avishy Carmi, David Nahamoo, Bhuvana Ramabhadran and Tara Sainath.Speaker Biography
Dimitri Kanevsky is a research staff member in the Speech and Language algorithms department at IBM T.J.Watson Research Center. Prior to joining IBM, he worked at a number of prestigious centers for higher mathematics, including Max Planck Institute in Germany and the Institute for Advanced Studies in Princeton. At IBM he has been responsible for developing the first Russian automatic speech recognition system, as well as key projects for embedding speech recognition in automobiles and broadcast transcription systems. He currently holds 110 US patents and received a Master Inventor title at IBM. His conversational biometrics based security patent was recognized by MIT, Technology Review, as one of five most influential patents for 2003 and his work on Extended Baum-Welch algorithm in speech was recognized as 2002 science accomplishment by the Director of Research at IBM.March 24, 2009
“Values and Patterns” 
Alon Orlitsky, University of California, San Diego
Abstract
Via four applications: distribution modeling, probability estimation, data compression, and classification, we argue that when learning from data, discrete values should be ignored except for just their appearance-order pattern. Along the way, we encounter Laplace, Good, Turing, Hardy, Ramanujan, Fisher, Shakespeare, and Shannon. The talk is self contained and based on work with P. Santhanam, K. Viswanathan, J. Zhang, and others.Speaker Biography
Alon Orlitsky received B.Sc. degrees in Mathematics and Electrical Engineering from Ben Gurion University in 1980 and 1981, and M.Sc.and Ph.D. degrees in Electrical Engineering from Stanford University in 1982 and 1986. From 1986 to 1996 he was with the Communications Analysis Research Department of Bell Laboratories. He spent the following year as a quantitative analyst at D.E. Shaw and Company, an investment firm in New York city. In 1997 he joined the University of California, San Diego, where he is currently a professor of Electrical and Computer Engineering and of Computer Science and Engineering, and directs the Information Theory and Applications Center. Alon's research concerns information theory, statistical modeling, machine learning, and speech recognition. He is a recipient of the 1981 ITT International Fellowship and the 1992 IEEE W.R.G. Baker Paper Award, a co-recipient of the 2006 Information Theory Society Paper Award, a fellow of the IEEE, and holds the Qucalcomm Chair for Information Theory and its Applications at UCSD.March 31, 2009
“Neural Dynamics of Attentive Object Recognition, Scene Understanding, and Decision Making” 
Stephen Grossberg, Boston University
Abstract
This talk describes three recent models of how the brain visually understands the world. The models use hierarchical and parallel processes within and across the What and Where cortical streams to accumulate information that cannot in principle be fully computed at a single processing stage. The models hereby raise basic questions about the functional brain units that are selected by the evolutionary process, and challenge all models that use non-local information to explain vision. The ARTSCAN model (Fazl, Grossberg, & Mingolla, 2008, Cognitive Psychology) clarifies the following issues: What is an object? How does the brain learn to bind multiple views of an object into a view-invariant object category, during both unsupervised and supervised learning, while scanning its various parts with active eye movements? In particular, how does the brain avoid the problem of erroneously classifying views of different objects as belonging to a single object, and how does the brain direct the eyes to explore an object's surface even before it has a concept of the object? How does the brain coordinate object and spatial attention during object learning and recognition? ARTSCAN proposes an answer to these questions by modeling interactions between cortical areas V1, V2, V3A, V4, ITp, ITa, PPC, LIP, and PFC. The ARTSCENE model (Grossberg & Huang, 2008, Journal of Vision) also uses attentional shrouds. It clarifies the following issues: How do humans rapidly recognize a scene? How can neural models capture this biological competence to achieve state-of-the-art scene classification? ARTSCENE classifies natural scene photographs better than competing models by using multiple spatial scales to efficiently accumulate evidence for gist and texture. The model can incrementally learn and rapidly predict scene identity by gist information alone (defining gist computatationally along the way), and then accumulate learned evidence from scenic textures to refine this hypothesis. The MODE model (Grossberg & Pilly, 2008, Vision Research) clarifies the following basic issue: How does the brain make decisions? Speed and accuracy of perceptual decisions covary with certainty in the input, and correlate with the rate of evidence accumulation in parietal and frontal cortical "decision neurons." MODE models interactions within and between Retina/LGN and cortical areas V1, MT, MST, and LIP, gated by basal ganglia, to simulate dynamic properties of decision-making in response to ambiguous visual motion stimuli used by Newsome, Shadlen, and colleagues in their neurophysiological experiments. The model shows how the brain can carry out probabilistic decisions without using Bayesian mechanisms.April 7, 2009
“The Neural Control of Speech” 
Frank Guenther, Boston University
Abstract
Speech production involves coordinated processing in many regions of the brain. To better understand these processes, our laboratory has designed, tested, and refined a neural network model whose components correspond to brain regions involved in speech. Babbling and imitation phases are used to train neural mappings between phonological, articulatory, auditory, and somatosensory representations. After learning, the model can produce syllables and words it has learned by commanding movements of an articulatory synthesizer. Because the model’s components correspond to neurons and are given precise anatomical locations, activity in the model’s cells can be compared to neuroimaging data. Computer simulations of the model account for a wide range of experimental findings, including data on acquisition of speaking skills, articulatory kinematics, and brain activity during speech. "Impaired" versions of the model are being used to investigate several communication disorders, and the model has been used to guide development of a neural prosthesis aimed at restoring speech output to profoundly paralyzed individuals.Speaker Biography
Frank Guenther, Professor of Cognitive and Neural Systems at Boston University, is a computational and cognitive neuroscientist specializing in speech and motor control. He received an MS in Electrical Engineering from Princeton University in 1987 and PhD in Cognitive and Neural Systems from Boston University in 1993. He is also a faculty member in the Harvard University/MIT Speech and Hearing Bioscience and Technology Program and a research affiliate at Massachusetts General Hospital. His research combines theoretical modeling with behavioral and neuroimaging experiments to characterize the neural computations underlying speech and language. He is also involved in the development of speech prostheses that utilize brain-computer interfaces to restore synthetic speech to paralyzed individuals.April 14, 2009
“Inducing Synchronous Grammars for Machine Translation” 
Phil Blunsom, University of Edinburgh, UK
Abstract
In this talk I'll outline current work at the University of Edinburgh to model machine translation (MT) as a probabilistic machine learning problem. Although MT systems have made large gains in translation quality in recent years, most are currently induced using a hand engineered pipeline of disparate models linked by heuristics. Although such techniques are effective for translating between related languages, they fail to capture the latent structure necessary to translate between languages which diverge significantly in word order, such as Chinese and English. I'll present a non-parametric Bayesian model for inducing synchronous context free grammars capable of learning the latent structure of translation equivalence from a corpus of parallel string pairs. I'll discuss the efficacy of both variational Bayes and Gibbs sampling inference procedures for this model and present experiments demonstrating competitive results on full scale translation evaluations.Speaker Biography
Phil Blunsom is a Research Fellow in the Institute for Communicating and Collaborative Systems at the University of Edinburgh. He completed his PhD at the University of Melbourne in 2007. His current research interests focus upon the application of machine learning to complex structured problems in language processing, such as machine translation, language modelling, parsing and grammar induction.April 21, 2009
“Integrating Evidence Over Time: A Look at Conditional Models for Speech and Audio Processing” 
Eric Fosler-Lussier, Ohio State University
Abstract
Many acoustic events, particularly those associated with speech events, can be thought of as events in a rich descriptive subspace where the dimensions of the subspace can be thought of as a sort of decomposition of the original event space. In phonetic terms, we can think of how phonological features can be integrated to determine phonetic identity; for auditory scene analysis we can look how features like harmonic energy and cross-channel correlation come together to determine whether a particular frequency corresponds to target speech versus background noise. Some success has been achieved by thinking of these problems as probabilistic detection of acoustic (sub-)events. However, event detectors are typically local in nature, and need to be smoothed out by looking at neighboring events in time. In this talk, I describe current work in the Speech and Language Technologies Lab at OSU where we are looking at Conditional Random Fields models for both automatic speech recognition and computational auditory scene analysis problems. The talk will explore some of the successes and limitations of this log-linear method which integrates local evidence over time sequences. Joint work with Jeremy Morris, Ilana Heintz, Rohit Prabhavalkar, Zhaozhang Jin.Speaker Biography
Eric Fosler-Lussier is currently an Assistant Professor of Computer Science and Engineering, with an adjunct appointment in Linguistics, at the Ohio State University. He received his Ph.D. in 1999 from the University of California, Berkeley, performing his dissertation research at the International Computer Science Institute under the tutelage of Prof. Nelson Morgan. He has also been a Member of Technical Staff at Bell Labs, Lucent Technologies, and a Visiting Researcher at Columbia University. He is generally interested in integrating linguistic insights as priors in statistical learning systems.April 28, 2009
“On Representing Acoustics of Speech for Speech Processing” 
Bishnu Atal, University of Washington
Abstract
Proper representation of the acoustic speech signal is crucial for almost every speech processing application. We often use short-time Fourier transform to convert the time-domain speech waveform to a new signal that is a function of both time and frequency by applying a moving time window of about 20 ms in duration. There are many issues, such as the size and shape of the window, that remain unresolved. The use of a relatively short window is widespread. In early development of the sound spectrograph, both narrow and wideband analysis were used, but the narrow-band analysis faded away. In digital speech coding applications (multipulse and code-excited linear prediction), high-quality speech is produced at low bit rates only when prediction using both short and long intervals is used. Recently Hermansky and others have argued that speech window for automatic speech recognition should be long, perhaps extending to as much as 1 s. What are the issues that arise in using a short or a long window? What are the relative advantages or disadvantages? In this talk, we will discuss these topics and present results that suggest that a short-time Fourier transform using long windows has advantages. In most speech representations, the Fourier components are not used directly but converted to their magnitude spectrum; the so-called phase is considered to be irrelevant. There are open questions regarding the use of phase information and we will discuss this important issue in the talk.Speaker Biography
Bishnu S. Atal is an Affiliate Professor in the Electrical Engineering Department at the University of Washington, Seattle, WA. He retired in March 2002 after working for more than 40 years at Lucent Bell Labs, and AT&T Labs. He was a Technical Director at the AT&T Shannon Laboratory, Florham Park, New Jersey, from 1997 where he was engaged in research in speech coding and in automatic speech recognition. He joined the technical staff of AT&T Bell Laboratories in 1961, became head of Acoustics Research Department in 1985, and head of Speech Research Department in 1990. He is internationally recognized for his many contributions to speech analysis, synthesis, and coding. His pioneering work in linear predictive coding of speech established linear prediction as one of the most important speech analysis technique leading to many applications in coding, recognition and synthesis of speech. His research work is documented in over 90 technical papers and he holds 17 U.S. and numerous international patents in speech processing. He was elected to the National Academy of Engineering in 1987 and to the National Academy of Sciences in 1993. He is a Fellow of the Acoustical Society of America and the IEEE. He received the IEEE Morris N. Liebmann Memorial Field Award in 1986, the Thomas Edison Patent Award from the R&D Council of New Jersey in 1994, New Jersey Inventors Hall of Fame Inventor of the Year Award in 2000 and the Benjamin Franklin Medal in Electrical Engineering in 2003. Bishnu and his wife, Kamla, reside in Mukilteo, Washington. They have two daughters, Alka and Namita, two granddaughters, Jyotica and Sonali and two grandsons, Ananth and Niguel.June 24, 2009
“Looking Behind Verb Classes”
Beth Levin, Stanford
Abstract
Fillmore's study "The Grammar of Hitting and Breaking" demonstrated the importance of semantically coherent verb classes as descriptive devices for understanding the organization of the verb lexicon and for capturing patterns of shared verb behavior. Much subsequent work has confirmed and extended the findings of this study. For example, in my 1993 book "English Verb Classes and Alternations", verbs are essentially classified in two ways: according to their semantic content (e.g., verbs of manner of motion, verbs of sound emission) and according to their participation in particular argument alternations (e.g., dative alternation, causative alternation). The first approach yields a fairly fine-grained semantic classification, while the second yields a coarser-grained classification, which appears to have more grammatical relevance than the first. This and other work suggests that verb classes should not be taken as primitive, as they have sometimes been. This position is reinforced by the considerable number of verbs which show complex patterns of behavior that have been handled by positing multiple semantic class membership. Previous studies of verb classes, then, raise important questions: What are the most useful dimensions for classifying verbs? What is the appropriate grain size for the description of verb classes? What determines whether a given verb shows multiple class membership? In this talk I ask what is behind verb classes that makes them so appealing as a research tool, yet explains their limitations. I show that many phenomena falling under the label "verb class" can be understood in the context of three levels of linguistic description and the relations between them: (i) the meaning lexicalized by the verb itself (its "root"), (ii) the set of event schemas, and (iii) the morphosyntactic devices that languages make available for the realization of arguments (e.g., grammatical relations, case markers, serial verb constructions). Each provides a way of grouping verbs into classes that can be helpful for certain facets of both language-specific and crosslinguistic studies.Speaker Biography
Beth Levin is the William H. Bonsall Professor in the Humanities at Stanford University. After receiving her Ph.D. from MIT, she had major responsibility for the MIT Lexicon Project and taught at Northwestern University. Her work investigates the semantic representation of events and the morphosyntactic devices English and other languages use to express events and their participants. Her publications include English Verb Classes and Alternations: A Preliminary Investigation (1993) and with Malka Rappaport Hovav, Argument Realization (2005) and Unaccusativity: At the Syntax-Lexical Semantics Interface (1995)July 1, 2009
“How to Make a Billion Dollars: A Guide to Large-Economic-Scale Innovation”
Eric Brill, Microsoft
Abstract
It’s easy to have a $50 idea. Innovation at a scale large enough to be material to a big company like Microsoft is a completely different story. I’ll discuss some of the interesting challenges and opportunities innovating at the scale of a billion dollars, and how different types of innovation play a role in driving financial value. I’ll also share other things I’ve learned in my decade at Microsoft, including: tips to young scientists on how to have a great life-long career in a corporate setting, and how to make basic research much more valuable/impactful/profitable.Speaker Biography
Eric Brill has spent the last 10 years working at Microsoft. He spent 9 years in Microsoft Research, running a research lab that focuses primarily on machine learning and data mining techniques for search and online advertising. Last year, he moved to the AdCenter product group to run a multi-national applied research lab called AdCenter Labs. Recently, he moved to the AdCenter Garage, a small group working on creating, prototyping and deploying game changer innovations. Prior to Microsoft, Eric spent 5 wonderful years as a faculty member at Johns Hopkins, in the Department of Computer Science and Center for Language and Speech Processing.July 8, 2009
“Sequence Kernels for Speaker and Speech Recognition” 
Mark Gales, University of Cambridge
Abstract
Conceptually sequence kernels map variable length sequences into a fixed dimensional feature-space. In this feature space, for example, an inner-product can be computed. The ability to handle variable length sequences means that these kernels are suitable for speech signals which are by nature time varying. In the speech processing area, sequence kernels have been succesfully applied in speaker verification, where they are used in combination with support vector machines (SVMs) for classification. This talk will concentrate on a particular class of sequence kernels, generative kernels and how they can be used for speaker and speech recognition. Generative kernels, and score-spaces, make use of generative models such as hidden Markov models (HMMs) and Gaussian mixture models (GMMs). By taking first and higher-order derivatives of the log likelihood with respect to the model paarameters fixed dimenesional feature vectors can be extracted. An example of this form of kernel is the Fisher Kernel successfully applied to a range of biological sequences. The relationship of this form of kernel to schemes such as the GMM mean-Supervector kernel, commonly used in speaker verification, will be discussed. In addition, how these kernels and associated feature-spaces can be used for speech recognition and how they can handle speaker and environment changes will be looked at.Speaker Biography
Mark Gales studied for the B.A. in Electrical and Information Sciences at the University of Cambridge from 1985-88. Following graduation he worked as a consultant at Roke Manor Research Ltd. In 1991 he took up a position as a Research Associate in the Speech Vision and Robotics group in the Engineering Department at Cambridge University. In 1995 he completed his doctoral thesis: Model-Based Techniques for Robust Speech Recognition supervised by Professor Steve Young. From 1995-1997 he was a Research Fellow at Emmanuel College Cambridge. He was then a Research Staff Member in the Speech group at the IBM T.J.Watson Research Center until 1999 when he returned to Cambridge University Engineering Department as a University Lecturer. He is currently a Reader in Information Engineering and a Fellow of Emmanuel College. Mark Gales is a Senior Member of the IEEE and was a member of the Speech Technical Committee from 2001-2004. He is currently an associate editor for IEEE Signal Processing Letters. Mark Gales was awarded a 1997 IEEE Young Author Paper Award for his paper on Parallel Model Combination and a 2002 IEEE Paper Award for his paper on Semi-Tied Covariance Matrices.July 16, 2009
“Technosocial Predictive Analytics”
Antonio Sanfilippo, Pacific Northwest National Laboratory
Abstract
Events occur daily that challenge the security, health and sustainable growth of our nation, and often find our government agencies unprepared for the catastrophic outcomes. These events involve the interaction of complex processes such as climate change, energy reliability, terrorism, nuclear proliferation, natural and man-made disasters, social/political and economic vulnerability. If we are to help our nation to meet the challenges that emerge from these events, we must develop novel methods for predictive analysis that support a concerted decision-making effort by analyst and policymakers to anticipate and counter strategic surprise. There is now increased awareness among subject-matter experts, analysts, and decision makers that a combined understanding of interacting physical and human factors is essential in addressing strategic surprise proactively. The Technosocial Predictive Analytics (TPA) framework provides an operational advancement of this insight through the development of new methods for anticipatory analysis and response that · implement a multi-perspective approach to predictive modeling through the integration of human and physical models · facilitate the achievement of knowledge/evidence inputs to support the modeling task and promote inferential transparency · enable analysts and policymakers to stress-test the quality of their intelligence products and planned responses without waiting for history to prove them right or wrong. Human Language Technologies (HLT) play an important role in the realization of this framework with specific reference to evidence extraction, but must be augmented to support TPA’s knowledge requirements properly. In presenting TPA, I will discuss an approach which provides such an extension of HLT through the integration of insights from specific domains of expertise and content analysis processes.Speaker Biography
Dr. Antonio Sanfilippo is Chief Scientist in the Computational and Statistical Analytics Division at Pacific Northwest National Laboratory (PNNL). His research focus is on Computational Linguistics, Content Analysis, Knowledge Technologies and Predictive Analytics with reference to Cognitive, Social, Behavioral and Biomedical Sciences. Dr. Sanfilippo holds a Laurea degree in Foreign Modern Languages awarded cum laude from the University of Palermo in Italy, M.A. and M. Phil. degrees in Anthropological Linguistics from Columbia University, and a Ph.D. in Cognitive Science from the University of Edinburgh (UK). Dr. Sanfilippo is the recipient of the 2008 Laboratory Director’s Award for Exceptional Scientific Achievement at PNNL. For more about Antonio please visit: http://www.linkedin.com/in/antoniosanfilippoJuly 24, 2009
“Computational Advertising”
Andrei Broder, Yahoo!
Abstract
Computational advertising is an emerging new scientific sub-discipline, at the intersection of large scale search and text analysis, information retrieval, statistical modeling, machine learning, classification, optimization, and microeconomics. The central challenge of computational advertising is to find the "best match" between a given user in a given context and a suitable advertisement. The context could be a user entering a query in a search engine ("sponsored search") , a user reading a web page ("content match" and "display ads"), a user watching a movie on a portable device, and so on. The information about the user can vary from scarily detailed to practically nil. The number of potential advertisements might be in the billions. Thus, depending on the definition of "best match" this challenge leads to a variety of massive optimization and search problems, with complicated constraints. This talk will give an introduction to this area focusing on the IR and NLP connections.Speaker Biography
Andrei Broder is a Fellow and Vice President for Computational Advertising in Yahoo! Research. He also serves as Chief Scientist of Yahoo’s Advertising Technology Group. Previously he was an IBM Distinguished Engineer and the CTO of the Institute for Search and Text Analysis in IBM Research. From 1999 until 2002 he was Vice President for Research and Chief Scientist at the AltaVista Company. He graduated Summa cum Laude from the Technion, and obtained his M.Sc. and Ph.D. in Computer Science at Stanford University. His current research interests are centered on computational advertising, web search, context-driven information supply, and randomized algorithms. Broder is co-winner of the Best Paper award at WWW6 (for his work on duplicate elimination of web pages) and at WWW9 (for his work on mapping the web). He has authored more than ninety papers and was awarded twenty-eight patents. He is an ACM Fellow, an IEEE fellow, and past chair of the IEEE Technical Committee on Mathematical Foundations of Computing.August 29, 2009
“Geometric and Event-Based Approaches to Speech Representation and Recognition”
Aren Jansen, University of Illinois
Abstract
Anyone who has used an automatic speech recognition (ASR) system, either on a customer support line or on their own personal computer, knows firsthand there is vast room for improvement. While state-of-the-art commercial systems perform very well in near-ideal environments, system robustness remains far below human levels. The prevailing hidden Markov model (HMM) based paradigm will undoubtedly see gains in future decades as increased computing capacity admits more complex acoustic models that encompass a range of acoustic environments. In the meantime, there is a wealth of scientific understanding of production and perceptual mechanisms that has yet to be fully exploited by engineers and technologists. In this talk, I will present the main results of a research program that takes scientific inspiration from linguistics, speech perception, and neuroscience as starting points for alternative directions in automatic speech recognition. First, I consider the implications speech production have on the geometric structure of speech sounds and the role this perspective can play in speech technology. Second, I consider the hypothesis that the linguistic content underlying human speech may be more efficiently and robustly coded in the pattern of timings of various acoustic events (landmarks) present in the speech signal. I will present a point process-based statistical framework for phonetic recognition and keyword spotting that matches the performance of equivalent frame-based systems. This approach suggests a new unsupervised adaptation strategy for improving recognizer robustness that outperforms maximum likelihood linear regression adaptation of a continuous density keyword-filler HMM system.Speaker Biography
Aren Jansen accepted a position of Senior Research Scientist at the Center of Excellence in Human Language Technology at JHU and is a candidate for a position of a Research Assistant Professor at the ECE department at JHU. He received the B.A. degree in physics from Cornell University in 2001. He received the M.S. degree in physics as well as the M.S. and Ph.D. degrees in computer science from the University of Chicago in 2003, 2005, and 2008, respectively, and has undertaken postdoctoral work at the University of Chicago. His research centers around exploring the interface of knowledge and statistical-based approaches to speech representation and recognition.September 6, 2009
“Geometric and Event-Based Approaches to Speech Representation and Recognition”
Aren Jansen, University of Illinois
Abstract
Anyone who has used an automatic speech recognition (ASR) system, either on a customer support line or on their own personal computer, knows firsthand there is vast room for improvement. While state-of-the-art commercial systems perform very well in near-ideal environments, system robustness remains far below human levels. The prevailing hidden Markov model (HMM) based paradigm will undoubtedly see gains in future decades as increased computing capacity admits more complex acoustic models that encompass a range of acoustic environments. In the meantime, there is a wealth of scientific understanding of production and perceptual mechanisms that has yet to be fully exploited by engineers and technologists. In this talk, I will present the main results of a research program that takes scientific inspiration from linguistics, speech perception, and neuroscience as starting points for alternative directions in automatic speech recognition. First, I consider the implications speech production have on the geometric structure of speech sounds and the role this perspective can play in speech technology. Second, I consider the hypothesis that the linguistic content underlying human speech may be more efficiently and robustly coded in the pattern of timings of various acoustic events (landmarks) present in the speech signal. I will present a point process-based statistical framework for phonetic recognition and keyword spotting that matches the performance of equivalent frame-based systems. This approach suggests a new unsupervised adaptation strategy for improving recognizer robustness that outperforms maximum likelihood linear regression adaptation of a continuous density keyword-filler HMM system.Speaker Biography
Aren Jansen accepted a position of Senior Research Scientist at the Center of Excellence in Human Language Technology at JHU and is a candidate for a position of a Research Assistant Professor at the ECE department at JHU. He received the B.A. degree in physics from Cornell University in 2001. He received the M.S. degree in physics as well as the M.S. and Ph.D. degrees in computer science from the University of Chicago in 2003, 2005, and 2008, respectively, and has undertaken postdoctoral work at the University of Chicago. His research centers around exploring the interface of knowledge and statistical-based approaches to speech representation and recognition.September 8, 2009
“Improving Machine Translation by Propagating Uncertainty” 
Chris Dyer, University of Maryland
Abstract
NLP systems typically consist of a series of components where the output of one module (e.g., a word segmenter) serves as input to another (e.g., a translator). Integration between the components is often achieved using only the 1-best analysis from an upstream component as the input to a downstream component. Unfortunately, this naive integration strategy results in compounding error propagation (cf. Finkel et al. 2006, Dyer et al. 2008). In this talk, I briefly review the effects of this problem in machine translation, where examples of upstream uncertainty include not only the noisy outputs of statistical preprocessors (such as word segmenters and STT systems), but also "development-time" decisions (such as determining what the appropriate granularity of the lexical units is or how much text normalization to do). I show that by encoding input alternatives in a word lattice, translation quality can be improved over a 1-best baseline, with only a slight runtime performance cost. I then explore in more detail the implications of modeling development-time uncertainty jointly with translation, focusing on the problem of source language word segmentation. I tackle this problem in two ways. First, I present a Markov random field model of word segmentation and describe how to use it to generate lattices appropriate for translation by training it to maximize the (conditional) probability of a collection of segmentation alternatives, rather than maximizing the probability of a single correct analysis. Second, I describe generalized alignment models that align lattices in one language to strings in another, enabling the joint modeling of segmentation (or other noisy processes) and translation. Since lattice inputs break the Markov assumptions that enable the efficient inference made in many common word alignment models, I also present novel Monte Carlo techniques for performing word and lattice alignment.Speaker Biography
Chris Dyer is a Ph.D. candidate at the University of Maryland, College Park, in the Department of Linguistics under the supervision of Philip Resnik. His research interests include statistical machine translation, computational morphology and phonology, unsupervised learning, and scaling NLP models to deal with larger data sets using the MapReduce programming paradigm. He is graduating this spring and will be joining Noah Smith's lab as a postdoc.September 15, 2009
“EM Works for Pronoun-Anaphora Resolution” 
Eugene Charniak, Brown University, Department of Computer Science
Abstract
EM (the Expectation Maximization Algorithm) is a well known technique for unsupervised learning (where one does not have any hand labeled solutions available, but instead one must learn from the raw text). Unfortunately EM is known to fail to find good solutions in many (most?) applications on which it is tried. In this talk we present some recent work on using EM to learn how to resolve pronoun-anaphora, e.g., determining that "the dog" is the antecedent of "he" and "his" in "When Sally fed the dog he wagged his tail". For this application EM works strikingly well, determining tens of thousands of parameters and resulting in a program that produces state of the art performance.Speaker Biography
Eugene Charniak is University Professor of Computer Science at Brown University and past chair of the department. He received his A.B. degree in Physics from University of Chicago, and a Ph.D. from M.I.T. in Computer Science. He has published four books the most recent being Statistical Language Learning. He is a Fellow of the American Association of Artificial Intelligence and was previously a Councilor of the organization. His research has always been in the area of language understanding or technologies which relate to it. Over the last 15 years years he has been interested in statistical techniques for many areas of language processing including parsing, discourse and anaphora.September 22, 2009
“Embracing Language Diversity: Unsupervised Multilingual Learning” 
Regina Barzilay, MIT
Abstract
For centuries, the deep connection between human languages has fascinated scholars, and driven many important discoveries in linguistics and anthropology. In this talk, I will show that this connection can empower unsupervised methods for language analysis. The key insight is that joint learning from several languages reduces uncertainty about the linguistic structure of each individual language. I will present multilingual generative unsupervised models for morphological segmentation, part-of-speech tagging, and parsing. In all of these instances we model the multilingual data as arising through a combination of language-independent and language-specific probabilistic processes. This feature allows the model to identify and learn from recurring cross-lingual patterns to improve prediction accuracy in each language. I will also discuss ongoing work on unsupervised decoding of ancient Ugaritic tablets using data from related Semitic languages. This is joint work with Benjamin Snyder, Tahira Naseem and Jacob Eisenstein.Speaker Biography
Regina Barzilay is an assosiate professor in the Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory. Her research interests are in natural language processing. She is a recipient of the NSF Career Award, Microsoft Faculty Fellowship, and has been named as one of "Top 35 Innovators Under 35" by Technology Review Magazine. She received her Ph.D. in Computer Science from Columbia University in 2003 and spent a year as a postdoc at Cornell University. Regina received her M.S. in 1998 and B.A. in 1992, both from Ben-Gurion University, Israel.September 29, 2009
“Repetition and Language Models and Comparable Corpora”
Ken Church, Johns Hopkins University
Abstract
I will discuss a couple of non-standard features that I believe could be useful for working with comparable corpora. Dotplots have been used in biology to find interesting DNA sequences. Biology is interested in ordered matches, which show up as (possibly broken) diagonals in dotplots. Information Retrieval is more interested in unordered matches (e.g., cosine similarity), which shows up as squares in dotplots. Parallel corpora have both squares and diagonals multiplexed together. The diagonals tell us what is a translation of what, and the squares tell us what is in the same language. There is also an opportunity to take advantage of repetition in comparable corpora. Repetition is very common. Standard bag-of-word models in Information Retrieval do not attempt to model discourse structure such as given/new. The first mention in a news article (e.g., "Manuel Noriega, former President of Panama") is different from subsequent mentions (e.g., "Noriega"). Adaptive language models were introduced in Speech Recognition to capture the fact that probabilities change or adapt. After we see the first mention, we should expect a subsequent mention. If the first mention has probability p, then under standard (bag-of words) independence assumptions, two mentions ought to have probability p^2, but we find the probability is actually closer to p/2. Adaptation matters more for meaningful units of text. In Japanese, words (meaningful sequences of characters) are more likely to be repeated than fragments (meaningless sequences of characters from words that happen to be adjacent). In newswire, we find more adaptation for content words (proper nouns, technical terminology, out of vocabulary (OOV) words and good keywords for information retrieval), and less adaptation for function words, clichés and ordinary first names. There is more to meaning than frequency. Content words are not only low frequency, but likely to be repeated.Speaker Biography
MIT undergrad (1978) and grad (1983), followed by 20 years at AT&T Bell Labs (1983-2003) and 6 years at Microsoft Research (2003-2009). Currently, at Hopkins as Chief Scientist of the Human Language Technology Center of Excellence as well as Research Professor in Computer Science. Honors: AT&T Fellow. I have worked on many topics in computational linguistics including: web search, language modeling, text analysis, spelling correction, word-sense disambiguation, terminology, translation, lexicography, compression, speech (recognition and synthesis), OCR, as well as applications that go well beyond computational linguistics such as revenue assurance and virtual integration (using screen scraping and web crawling to integrate systems that traditionally don't talk together as well as they could such as billing and customer care). When we were reviving empirical methods in the 1990s, we thought the AP News was big (1 million words per week), but since then I have had the opportunity to work with much larger data sets such as telephone call detail (1-10 billion records per month) and web logs (even bigger).October 6, 2009
“Generic knowledge: acquisition and representation” 
Lenhart Schubert, University of Rochester
Abstract
AI is beginning to make some dents in the "knowledge acquisition bottleneck", the problem of acquiring large amounts of general world knowledge to support language understanding and commonsense reasoning. Two text-based approaches to the problem are (1) to abstract such knowledge from patterns of predication and modification in miscellaneous texts, and (2) to derive such knowledge by direct interpretation of general statements in ordinary language, such as are found in lexicons and resources like Open Mind. I will discuss the status of our efforts in these directions (currently centered around the KNEXT system), and the problems that are encountered. Among these problems are what exactly is meant by generalities such as "Cats land on their feet", and how this meaning should be formalized. One particular difficulty is that such statements typically involve ``donkey anaphora". I will suggest a "dynamic Skolemization" approach that leads naturally to script- or frame-like representations, of the sort that have been developed in AI independently of linguistic considerations.Speaker Biography
Lenhart Schubert is a professor of computer science at the University of Rochester, with primary interests in natural language understanding, knowledge representation and acquisition, reasoning, and self-awareness. He is a fellow of the AAAI, has served as program chair for several AI/KR/CL conferences, and has published over a hundred articles, including ones in philosophical and linguistic handbooks and encyclopedias.October 13, 2009
“Using speech models for separation in monaural and binaural contexts”
Dan Ellis, Columbia University
Abstract
When the number of sources exceeds the number of microphones, acoustic source separation is an underconstrained problem that must rely on additional constraints for solution. In a single-channel environment the expected behavior of the source -- i.e. an acoustic model -- is the only feasible basis for separation. I will describe our recent work in monaural speech separation based on fitting parametric "eigenvoice" speaker-adapted models to both voices in a mixture. In a binaural, reverberant environment, the interaural characteristics of an acoustic source exhibit structure that can be used to separate, even without prior knowledge of location or room characteristics. I will present MESSL, our EM-based system for source separation and localization. MESSL's probabilistic foundation facilitates the incorporation of more specific source models; I will also describe MESSL-EV, which incorporates the eigenvoice speech models for improved binaural separation in reverberant environments. Joint work with Ron Weiss and Mike Mandel.Speaker Biography
Daniel P. W. Ellis received the Ph.D. degree in electrical engineering from the Massachusetts Institute of Technology, Cambridge, where he was a Research Assistant in the Machine Listening Group of the Media Lab. He spent several years as a Research Scientist at the International Computer Science Institute, Berkeley, CA. Currently, he is an Associate Professor with the Electrical Engineering Department, Columbia University, New York. His Laboratory for Recognition and Organization of Speech and Audio (LabROSA) is concerned with all aspects of extracting high-level information from audio, including speech recognition, music description, and environmental sound processing. He also runs the AUDITORY email list of 1700 worldwide researchers in perception and cognition of sound.October 20, 2009
“Predicting Language Change” 
Charles Yang, University of Pennsylvania
Abstract
The parallels between language change and biological changes were noted by none other than Darwin himself. However, the development of a mathematical foundation for evolution has not taken place in the study of language change, even though tools from quantitative genetics have seen applications in the linguistic arena.This work attempts to develop a series of models of language change, drawing insights from population genetics on the one hand, and modern theories of linguistic structures, language acquisition and language processing on the other. The dynamics of language learning over generations turn out to bear strong resemblance to the process of Natural Selection. In some cases, this allows one to quantitatively measure the "fitness" of grammatical hypotheses and thus predict the directionality of language change. I will discuss the general use of population models in language, and present two specific case studies: the word order change from Old French to Modern French, and the cot-caught merger recently documented at the Massachusetts and Rhode Island border. The outcome of both changes is shown to be entirely predictable from the statistical composition of linguistic data in the environment.Speaker Biography
Charles Yang teaches linguistics and computer science at the University of Pennsylvania, where he works on language learning, language change, and computational linguistics. He is the author of three books, and is currently finishing a monograph on the computational properties of words.October 27, 2009
“A new Golden Age of phonetics?” 
Mark Liberman, University of Pennsylvania
Abstract
From the perspective of a linguist, today's vast archives of digital text and speech, along with new analysis techniques from language engineering, look like a wonderful new scientific instrument, a modern equivalent of the 17th-century invention of the telescope and microscope. We can now observe linguistic patterns in space, time, and cultural context, on a scale three to five orders of magnitude greater than in the past, and simultaneously in much greater detail than before. Scientific use of these new instruments remains mainly potential, especially in phonetics and related disciplines, but the next decade is likely to be a new "golden age" of research. This talk will discuss some of the barriers to be overcome, present some successful examples, and speculate about future directions.Speaker Biography
Biographical information for Mark Liberman is available from http://ling.upenn.edu/~mylNovember 3, 2009
“Vector-based Models of Semantic Composition”
Mirella Lapata, University of Edinburgh
Abstract
Vector-based models of word meaning have become increasingly popular in natural language processing and cognitive science. The appeal of these models lies in their ability to represent meaning simply by using distributional information under the assumption that words occurring within similar contexts are semantically similar. Despite their widespread use, vector-based models are typically directed at representing words in isolation and methods for constructing representations for phrases or sentences have received little attention in the literature. In this talk we propose a framework for representing the meaning of word combinations in vector space. Central to our approach is vector composition which we operationalize in terms of additive and multiplicative functions. Under this framework, we introduce a wide range of composition models which we evaluate empirically on a phrase similarity task. We also propose a novel statistical language model that is based on vector composition and can capture long-range semantic dependencies. Joint work with Jeff MitchellSpeaker Biography
Mirella Lapata is a reader (US equivalent to associate professor) in the School of Informatics at the University of Edinburgh. Her research interests are in natural language processing focusing on semantic interpretation and generation. She obtained a PhD degree in Informatics from the University of Edinburgh in 2001 and spent two years as faculty member at the Department of Computer Science at the University of Sheffield. She received a B.A. degree in computer science from the University of Athens in 1994 and an Msc degree from Carnegie Mellon University in 1998.November 10, 2009
“We KnowItAll: lessons from a Quarter Century of Web Extraction Research”
Oren Etzioni, University of Washington
Abstract
For the last quarter century (measured in person years), the KnowItAll project has investigated information extraction at Web scale. If successful, this effort will begin to address the long-standing "Knowledge Acquisition Bottleneck" in Artificial Intelligence, and will enable a new generation of search engines that extract and synthesize information from text to answer complex user queries. To date, we have generalized information extraction methods to process arbitrary Web text, to handle unanticipated concepts, and to leverage the redundancy inherent in the Web corpus, but many challenges remain. One of the most formidable challenges is moving from extracting isolated nuggets of information to capturing a coherent body of knowledge that can support automatic inference. My talk will describe the lessons we have learned and identify directions for future work.Speaker Biography
Oren Etzioni is the Washington Research Foundation Entrepreneurship Professor at the University of Washington's Computer Science Department.He received his bachelor's degree in Computer Science from Harvard University in June 1986 where he was the first Harvard student to "major" in Computer Science. Etzioni received his Ph.D. from Carnegie Mellon University in January 1991, and joined the University of Washington's faculty in February 1991, where he is now a Professor of Computer Science. Etzioni received a National Young Investigator Award in 1993, and was selected as a AAAI Fellow a decade later. He is the founder and director of the University of Washington's Turing Center .Etzioni is also a Venture Partner at Madrona Venture Group where he chairs the Technology Advisory Board. He was the founder of Farecast, a company that utilizes data mining techniques to anticipate airfare fluctuations. Microsoft acquired Farecast in 2008. He was a co-founder of Clearforest, a text-mining startup, which was acquired by Reuters in 2007. He was the Chief Technology Officer and a board member of Go2net, which was acquired by Infospace in 2000. Finally, he co-founded Netbot, acquired by Excite in 1997. At Netbot, he helped to conceive of and design the web's first major comparison-shopping agent. In 1995, Etzioni and his student Erik Selberg developed MetaCrawler, the web’s premier Meta-search engine for several years, now being run by Infospace. Finally, he has served on the board of Performant (acquired by Mercury Interactive in 2003) and been a consultant or advisor to Askjeeves, Excite, Infospace, Google, Microsoft, Northern Telecom, SAIC, Vivisimo, and Zillow, and others.November 17, 2009
“Graph Identification”
Lise Getoor, University of Maryland
Abstract
Within the machine learning and data mining communities, there has been a growing interest in learning structured models from input data that is itself structured or semi-structured. Graph identification refers to methods that transform observational data described as a noisy input graph into an inferred, "clean" information graph. Examples include inferring social networks from online, noisy, communication data, identifying gene regulatory networks from protein-protein interactions, and extracting semantic graphs from noisy and ambiguous co-occurrence information. Some of the key processes in graph identification are: entity resolution, collective classification, and link prediction. I will overview algorithms for these tasks, discuss the need for integrating the methods to solve the overall problem jointly. Time permitting, I will also give quick overviews of some of the other research projects in my group.Speaker Biography
Lise Getoor is an associate professor in the Computer Science Department at the University of Maryland, College Park. She received her PhD from Stanford University in 2001. Her current work includes research on link mining, statistical relational learning and representing uncertainty in structured and semi-structured data. She has also done work on social network analysis and visual analytics. She has published numerous articles in machine learning, data mining, database, and artificial intelligence forums. She was awarded an NSF Career Award, is an action editor for the Machine Learning Journal, is a JAIR associate editor, has been a member of AAAI Executive council, and has served on a variety of program committees including AAAI, ICML, IJCAI, ISWC, KDD, SIGMOD, UAI, VLDB, and WWW.November 24, 2009
“Hierarchical Phrase-based Translation with Weighted Finite State Transducers”
Bill Byrne, University of Cambridge
Abstract
HiFST is a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying long-span language models and Minimum Bayes Risk decoding.Speaker Biography
Bill Byrne is a Reader in Information Engineering at the University of Cambridge.December 1, 2009
“Communication Disorders and Speech Technology”
Elmar Noeth, Friedrich-Alexander University Erlangen-Nuremberg
