Archived Seminars by Year

Show all Seminars     Only show seminars with video

2004

February 10, 2004

“Name That Tune: Finding a song from a sung query”

Bryan Pardo, University of Michigan

[abstract] [biography]

Abstract

Music Information Retrieval has become an active area of research motivated by the increasing importance of internet-based music distribution. Online catalogs are already approaching one million songs, so it is important to study new techniques for searching these vast stores of audio. One approach to finding music that has received much attention is "Query by Humming" _LP_QBH_RP_. This approach enables users to retrieve songs and information about them by singing, humming, or whistling a melodic fragment. In QBH systems, the query is a digital audio recording of a melodic fragment, and the ultimate target is a complete digital audio recording of a piece. We have created a QBH system for music search and retrieval. A user sings a theme from the desired piece of music. The sung theme _LP_query_RP_ is converted into a sequence of pitch-intervals and rhythms. This sequence is compared to musical themes _LP_targets_RP_ stored in a database. The top pieces are returned to the user in order of similarity to the sung theme. We describe two approaches to measuring similarity between database themes and the sung query. In the first, queries are compared to database themes using probabilistic string-alignment algorithms. Here, similarity between target and query is determined by edit cost. In the second approach, pieces in the database are represented as hidden Markov models _LP_HMMs_RP_. In this approach, the query is treated as an observation sequence and a target is judged similar to the query if its HMM has a high likelihood of generating the query. Experiments show that while no approach is clearly superior in retrieval ability, string matching often has a significant speed advantage. Moreover, neither approach surpasses human performance.

Speaker Biography

Bryan Pardo is a doctoral candidate in the Electrical Engineering and Computer Science department of the University of Michigan, with a specialization in Intelligent Systems. He applies machine learning, probabilistic natural language processing, and database search techniques to auditory user interfaces for human computer interaction. Bryan takes a broader view of natural language than is traditional in computational linguistics, including timbre and prosody _LP_timing, pitch contour, loudness_RP_, with an emphasis on music. In addition to his research activities, Bryan is also an adjunct professor of Music at Madonna University in Livonia, Michigan, where he teaches a course in music technology and also performs regularly throughout Michigan on saxophone and clarinet with his band, Into the Freylakh.

February 17, 2004

“A Bayesian view of inductive learning in humans and machines”   Video Available

Josh Tenenbaum, MIT

[abstract]

Abstract

In everyday learning and reasoning, people routinely draw successful generalizations from very limited evidence. Even young children can infer the meanings of words or the existence of hidden biological properties or causal relations from just one or a few relevant observations -- far outstripping the capabilities of conventional learning machines. How do they do it? I will argue that the success of peoples everyday inductive leaps can be understood as the product of domain-general rational Bayesian inferences constrained by peoples implicit theories of the structure of specific domains. This talk will explore the interactions between peoples domain theories and their everyday inductive leaps in several different task domains, such as generalizing biological properties and learning word meanings. I will illustrate how domain theories generate the hypothesis spaces necessary for Bayesian generalization, and how these theories may themselves be acquired as the products of higher-order statistical inferences. I will also show how our approach to modeling human learning motivates new machine learning techniques for semi-supervised learning: generalizing from very few labeled examples with the aid of a large sample of unlabeled data.

February 24, 2004

“Norms and Exploitations: Mapping Meaning onto Use”   Video Available

Patrick Hanks, Berlin-Brandenburg Academy of Sciences and Brandeis University

[abstract] [biography]

Abstract

Words in isolation have innumerable potential meanings. When they are used, the lexical entropy is greatly reduced. Corpus Pattern Analysis has shown that, while the number of possible contexts for each word is very great _LP_infinite?_RP_, the number of typical contexts is small and manageable. Corpus Pattern Analysis _LP_CPA_RP_ aims to account for all uses of each word by grouping its collocations into semantically motivated syntagmatic patterns. The patterns are then linked to meanings or other applications such as synonym sets or foreign translations. Noun patterns arrange statistically significant collocates in sets of prototypical statements _LP_e.g. "A storm may be gathering, brewing, impending, …; storms lash coastlines,…; people and ships get caught in a storm, weather a storm, ride out a storm, …; storms are violent, severe, raging, howling, …;" and so on_RP_. Verb patterns are built in the SPOCA framework. Pattern elements consist of lexical sets of nouns and other elements, grouped by their clause roles in relation to the target verb. Subvalency features such as determiners can also be relevant _LP_"took place" vs. "took his place" vs. "took someone elses place" vs. "took third place."_RP_ Because the normal meaning of a word can be not only activated but also exploited for rhetorical effect, the empirical linguistic theory arising from this work is known as the Theory of Norms and Exploitations. Typical exploitations include ad-hoc metaphors, ellipsis, and other figures of speech.

Speaker Biography

I am a lexicographer and corpus linguist. As chief editor of English dictionaries at Collins _LP_1970-90_RP_ and subsequently chief editor, current English dictionaries at Oxford University Press _LP_1990-2000_RP_, I created some of the worlds most successful English dictionaries, including the New Oxford Dictionary of English _LP_NODE_RP_ and the highly innovative Cobuild project _LP_based on corpus research at the University of Birmingham_RP_, described by the philosopher David Wiggins as "the first significant development in the study of word meaning since the 18th century". In the late 1980s, he was a visiting scientist at AT&T Bell Laboratories in New Jersey, where he co-authored a series of influential and widely cited papers on statistical approaches to lexical analysis. He has also pioneered practical advances in computational onomastics and is the editor in chief of the 3-volume Dictionary of American Family Names _LP_New York: Oxford University Press 2003_RP_. He is a Consultant _LP_Berater_RP_ in lexical semantics and corpus linguistics to the Digitalische W{o:}rterbuch der deutschen Sprache at the Berlin Brandenburg Academy of Sciences. He has been an invited keynote speaker at many conferences on lexicography, lexicology, and computational linguistics throughout the world.

March 2, 2004

“Semantic Lexicons and Semantic Tagging: towards content interoperability”   Video Available

Nicoletta Calzolari, Istituto Di Linguistica Computazionale

[abstract] [biography]

Abstract

Large scale language resources are unanimously recognised as the necessary infrastructure underlying language technology. Discussing a few major European initiatives for building harmonised lexicons, we will highlight how computational lexicons and textual corpora should be considered as complementary views on the lexical space. A ‘complete’ computational lexicon should incorporate our ‘knowledge of the world’, and represent it in an explicit and formal way. We claim that it is theoretically not possible to achieve completeness within any ‘static’ lexicon. A sound language infrastructure must encompass both ‘static’ lexicons, as the traditional ones, and ‘dynamic’ systems able to enrich the lexicon with information acquired on-line from large corpora, thus capturing the ‘actually realised’ potentialities, the large range of variation, and the flexibility inherent in the language as it is used. These are the challenges for semantic tagging. Part of the talk will point at problems arisen in different semantic annotation exercises. Broadening our perspective into the future, the need of ever growing language resources for effective content processing requires a change in the paradigm, and the design of a new generation of language resources, based on open content interoperability standards. The Semantic Web notion is going to crucially determine the shape of the language resources of the future, consistent with the vision of an open space of sharable knowledge available on the Web for processing.

Speaker Biography

Nicoletta Calzolari, graduated in Philosophy at the University of Bologna, is Director of Research at CNR, and now Director of the Istituto di Linguistica Computazionale of the CNR in Pisa, Italy. She works in the field of Computational Linguistics since 1972. Main fields of interest: computational lexicology and lexicography; text corpora; standardisation and evaluation of language resources; lexical semantics; knowledge acquisition from multiple _LP_lexical and textual_RP_ sources, integration and representation. She has co-ordinated many international/European and national projects. Member and general secretary of ICCL, member of the ELRA Board, and of many International Committees and Advisory Boards. Conference chair of LREC’04. Invited speaker, member of program committee or organiser for quite numerous international scientific conferences, workshops, etc.

March 2, 2004

“Semantic Lexicons & Semantic Tagging”   Video Available

Nicoletta Calzolari

March 9, 2004

“Automatic Speech Processing by Inference in Generative Models”

Sam Roweis, University of Toronto

[abstract]

Abstract

Say you want to perform some complex speech processing task. How should you develop the algorithm that you eventually use? Traditionally, you combine inspiration, carefully examination of previous work, and arduous trial-and-error to invent a sequence of operations to apply to the waveform. But there is another approach: dream up a "generative model" --a probabilistic machine which outputs data in the same form as your data--in which the key quantities that you would eventually like to compute appear as hidden _LP_latent_RP_ variables. Now perform inference in this model, estimating the hidden quantities. In doing so, the rules of probability will derive for you, automatically, a signal processing algorithm. While inference is well known to the speech community as a decoding step for HMMs, exactly the same type of calculation can be performed in many other models not related to recognition. In this talk, I will give several examples of this paradigm, showing how inference in very simple models can be used to perform surprisingly complex speech processing tasks including denoising, source separation, pitch tracking, timescale modification and estimation of articulatory movements from audio. In particular, I will introduce the factorial-max vector quantization _LP_MAXVQ_RP_ model, motivated by the astonishing max approximation to log spectrograms of mixtures, show that it can be used with an efficient branch-and-bound technique for exact inference to perform both additive denoising and monaural separation. I will also describe a purely time domain approach to pitch processing which identifies waveform samples at the boundaries between glottal pulse periods _LP_in voiced speech_RP_ or at the boundaries between unvoiced segments. An efficient algorithm for inferring these boundaries is derived from a simple probabilistic generative model for segments, which gives excellent results on pitch tracking, voiced/unvoiced detection and timescale modification.

March 23, 2004

“Structuring Semantic Representations”

Beth Levin, Stanford

[abstract] [biography]

Abstract

Over the years predicate decompositions --- representations of verb meaning that take the form of combinations of primitive predicates, such as the infamous CAUSE TO DIE for _kill_ --- have come in for substantial and sometimes well-merited criticism. Yet, such representations continue to be adopted, suggesting that there is something appealing about them. In this talk, I identify two underappreciated properties of these representations that make them effective semantic representations and present several types of evidence to demonstrate this. First, predicate decompositions can easily capture the "bipartite" nature of a verbs meaning. For instance, a specification of the meaning of _lengthen_ must indicate that it describes a change of state event and that the relevant state involves the length of the changed entity. These two types of meaning components can be represented using a small set of event types defined in terms of combinations of primitive predicates together with "roots" representing a verbs idiosyncratic or core meaning _LP_Grimshaw 1993, Hale & Keyser 2002, Jackendoff 1983, 1990, Mohanan & Mohanan 1999, Pesetsky 1995, Pinker 1989, RH&L 1998_RP_. Second, perhaps the most important distinction among event types involves a dichotomy between simple events and complex events --- an event composed of simple events. In fact, the notion "complex event" or a comparable notion --- most often "causative event" --- has been invoked since at least the generative semantics era, though its interpretation and role in linguistic explanation have changed over the years. Predicate decompositions can easily capture this fundamental distinction. In support of the importance of these two properties of semantic representations, I review ways in which they gain explanatory power in my joint work with Malka Rappaport Hovav. First, the distinct argument expression options manifested by two semantic classes of English transitive verbs --- surface contact verbs _LP_e.g., _wipe_, _scrub_, _sweep__RP_ and change of state verbs _LP_e.g., _break_, _dry_, _open__RP_ --- can be tied to differences in the complexity of the events they denote: simple events for surface contact verbs and complex events for change of state verbs; these differences, in turn, reflect differences in the nature of the roots of these two types of verbs. Second, the distribution of fake reflexives in resultative constructions _LP__Sally sang herself hoarse/*Sally sang hoarse__RP_ is sensitive to event complexity. Third, event complexity illuminates crosslinguistic variation in the transitive verb class and leads to a natural differentiation among transitive verb objects, providing insight into the repeated observations that not all objects are equal, observations that have previously attributed to slippery notions such as "affectedness." Finally, the interaction of event complexity and the bipartite nature of verb meaning provides the key to understanding the origins and properties of English object alternations _LP_e.g., the locative alternation: _stuff groceries into a bag/stuff a bag with groceries__RP_.

Speaker Biography

Beth Levin is the William H. Bonsall Professor in the Humanities and the Chair of the Department of Linguistics at Stanford University. After receiving her Ph.D. from MIT in 1983, she spent four years at the MIT Center for Cognitive Science, where she had major responsibility for the Lexicon Project. She joined Stanfords Department of Linguistics in 1999, after twelve years at Northwestern University. Her research focuses on the lexicon --- the component of the language system that serves as a repository for information on the words of a language. She has conducted extensive breadth- and depth-first studies of the English verb lexicon, which have provided the foundation for her theoretical research. Her recent work investigates the linguistic representation of events and the ways in which events and their participants are expressed in English and other languages.

March 30, 2004

“Scaling of Information in Natural Language”   Video Available

Naftali Tishby, The Hebrew University of Jerusalem

[abstract] [biography]

Abstract

The idea that the observed semantic structure of human language is a result of an adaptive competition between accuracy of expression and efficient communication is not new. It has been suggested in various forms by Zipf, Shannon, and Mandelbrot, among many others. In this talk I will discuss a novel technique for studying such a competition between accuracy and efficiency of communication, solely from the statistics of large linguistic corpora. By exploiting the deep and intriguing duality between source and channel coding in Shannons information theory we can explore directly the relationship between the semantic accuracy and the complexity of the representation in a large corpus of English documents. We do this by evaluating the accuracy in identifying the topic of a document as a function of the complexity of the semantic representation, as captured by relevant hierarchical clustering of words via the information bottleneck method, which can be viewed as a combination of perfectly matched source and channel. What we obtain is a scaling relation _LP_a power-law_RP_ that, unlike the famous Zipfs law, quantifies directly the statistical way words are semantically refined in human language. It may therefore reveal some quantitative properties of human cognition which can now be explored experimentally in other languages or other complex cognitive modalities such as music and mathematics. This work is partly based on joint work with Noam Slonim. See also: http://www.cs.huji.ac.il/labs/learning/Theses/Noam_phd1.ps.gz

Speaker Biography

Dr. Naftali Tishby is currently on sabbatical the at the CIS department at U Penn. Until last summer he served as the founding chair of the new computer engineering program at the School of Computer Science and Engineering at the Hebrew University. He is a founding member of the Interdisciplinary Center for Neural Computation _LP_ICNC_RP_ and one of the key teachers of the well known computational neuroscience graduate program of the ICNC. He received his PhD in theoretical physics from the Hebrew university in 1985 and has been a research member of staff at MIT, Bell Labs, AT&T, and NECI since then. His current research is on the interface between computer science, statistical physics, and computational biology. He introduced various methods from statistical mechanics into computational learning theory and machine learning and is interested in particular in the role of phase transitions in learning and cognitive phenomena. More recently he has been working on the foundation of biological information processing and has developed novel conceptual frameworks for relevant data representation and learning algorithms based on information theory, such as the Information Bottleneck method and Sufficient Dimensionality Reduction.

April 6, 2004

“Interactivity in Written Language Processing: Evidence from Impairments”   Video Available

Michael McCloskey, JHU Department of Cognitive Science

[abstract]

Abstract

Via two studies of cognitively impaired individuals I consider reciprocal interactions between levels of representation in written language processing The first study involves a severely dysgraphic stroke patient, and provides evidence of feedback from grapheme to lexeme levels of representation in written word production. The second study involves a young woman with a remarkable deficit in visual perception, and concerns top-down influences on reading comprehension for words and sentences.

April 13, 2004

“Discriminative Language Modeling for LVCSR”   Video Available

Murat Saraclar, AT&T Labs - Research

[abstract]

Abstract

This talk describes a discriminative language modeling technique for large vocabulary speech recognition. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields _LP_CRFs_RP_. The models are encoded as deterministic weighted finite-state automata, and are applied by intersecting the automata with word-lattices that are output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. We present results for various perceptron training scenarios for the Switchboard task, including using n-gram features of different orders, and performing n-best extraction versus using full word lattices. Using the feature set selected by the perceptron algorithm, CRF training provides an additional 0.5 percent reduction in word error rate, for a total of 1.8 percent absolute WER reduction from the baseline of 39.2 percent.

April 20, 2004

“Wiretapping the Brain”   Video Available

Terry Sejnowski, Salk Institute

[abstract] [biography]

Abstract

Blind source separation -- also called the cocktail party problem -- has recently been solved using Independent Component Analysis. This new signal processing technique has allowed us to eavesdrop onto the brains internal communication systems.

Speaker Biography

Terrence Sejnowski is an Investigator with the Howard Hughes Medical Institute and a Professor at The Salk Institute for Biological Studies where he directs the Computational Neurobiology Laboratory. He is also Professor of Biological Sciences and Adjunct Professor in the Departments of Physics, Neurosciences, Psychology, Cognitive Science, and Computer Science and Engineering at the University of California, San Diego, where he is Director of the Institute for Neural Computation. Dr. Sejnowski received B.S. in physics from the Case-Western Reserve University, M.A. in physics from Princeton University, and a Ph.D. in physics from Princeton University in 1978. From 1978-1979 Dr. Sejnowski was a postdoctoral fellow in the Department of Biology at Princeton University and from 1979-1982 he was a postdoctoral fellow in the Department of Neurobiology at Harvard Medical School. In 1982 he joined the faculty of the Department of Biophysics at the Johns Hopkins University, where he achieved the rank of Professor before moving to San Diego in 1988. He has had a long-standing affiliation with the California Institute of Technology, as a Wiersma Visiting Professor of Neurobiology in 1987, as a Sherman Fairchild Distinguished Scholar in 1993 and as a part-time Visiting Professor 1995-1998. Dr. Sejnowski received a Presidential Young Investigator Award in 1984. He received the Wright Prize from the Harvey Mudd College for excellence in interdisciplinary research in 1996 and the Hebb Prize for his contributions to learning algorithms by the International Neural Network Society in 1999. He bacame a Fellow of the Institute of Electrical and Electronics Engineers in 2000 and received their Neural Network Pioneer Award in 2002. In 2003 he was elected to the Johns Hopkins Society of Scholars. In 1989, Dr. Sejnowski founded Neural Computation, published by the MIT Press, the leading journal in neural networks and computational neuroscience. He is also the President of the Neural Information Processing Systems Foundation, a non-profit organization that oversees the annual NIPS Conference. This interdisciplinary meeting brings together researchers from many disciplines, including biology, physics, mathematics and engineering. The long-range goal Dr. Sejnowskis research is to build linking principles from brain to behavior using computational models. This goal is being pursued with a combination of theoretical and experimental approaches at several levels of investigation ranging from the biophysical level to the systems level. Hippocampal and cortical slice preparations are being used to explore the properties of single neurons and synapses. Biophysical models of electrical and chemical signal processing within neurons are used as an adjunct to physiological experiments. The dynamics of network models are studied to explore how populations of neurons interact during states of alertness and sleep. His laboratory has developed new methods for analyzing the sources for electrical and magnetic signals recorded from the scalp and hemodynamic signals from functional brain imaging.

April 27, 2004

“Discriminative Estimation of Mixtures of Exponential Distributions”

Vaibhava Goel, IBM

[abstract]

Abstract

An auxiliary function based approach for estimation of exponential model parameters under a maximum conditional likelihood _LP_MCL_RP_ objective was recently proposed by Gunawardana and Byrne. While for Gaussian mixture models it leads to parameter updates that were known previously, it is a very useful method in that it is applicable to arbitrarily constrained exponential models and the resulting auxiliary function is similar to the EM auxiliary function, thus eliminating the need for two separate optimization procedures. It is also easily extensible to other utility functions that are similar to MCL, such as sum-of-posteriors and maximum mutual information. One shortcoming of this approach, however, is that the validity of the auxiliary function is not rigorously established. In this talk I will present our work on discriminative estimation using the auxiliary function approach. Ill first discuss our recent proof of validity of the auxiliary function, and then present application of this approach for discriminative estimation of subspace constrained Gaussian mixture models _LP_SCGMMs_RP_, where the exponential model weights of all Gaussians are required to belong to a common subspace. SCGMMs have been shown to generalize and yield significant error rate reductions over previously considered model classes such as diagonal models, models with semi-tied covariances, and extended maximum likelihood linear transformation _LP_EMLLT_RP_ models. We find that MMI estimation of SCGMMs _LP_tried on a digit task so far_RP_ results in more than 20% relative reduction in word error rate over maximum likelihood estimation. Time permitting, Ill also discuss MCL estimation of language models that combine N-grams and stochastic finite state grammars. This work was done in collaboration with Scott Axelrod, Ramesh Gopinath, Peder Olsen, and Karthik Visweswariah.

May 11, 2004

“Network Models for Game Theory and Economics”   Video Available

Michael Kearns, University of Pennsylvania

[abstract]

Abstract

Over the last several years, a number of authors have developed graph-theoretic or network models for large-population game theory and economics. In such models, each player or organization is represented by a vertex in a graph, and payoffs and transactions are restricted to obey the topology of the graph. This allows the detailed specification of rich structure _LP_social, technological, organizational, political, regulatory_RP_ in strategic and economic systems. In this talk, I will survey these models and the attendant algorithms for certain basic computations, including Nash, correlated, and Arrow-Debreu equilibria. Connections to related topics, such as Bayesian and Markov networks for probabilistic modeling and inference, will be discussed. I will also discuss some recent work marrying this general line of thought with topics in social network theory.

July 6, 2004

“Opening Day Presentations”   Video Available

Various

July 14, 2004

“Mary Harper: CDG-Based Language Models”   Video Available

Mary Harper

July 28, 2004

“Applying Speech/Language Technologies to Communication Disorders: New Challenges for Basic Research”   Video Available

Jan von Sarten, Center for Spoken Language Understanding, Oregon Graduate Institute

September 14, 2004

“Algorithms and Rate-Distortion Bounds in Data Compression for Multi-User Communications”   Video Available

Michelle Effros, CalTech

[abstract]

Abstract

A network source code is a data compression algorithm designed specifically for the multi-user communication system in which it will be employed. Network source codes achieve better rate-distortion trade-offs and improved functionality over the _LP_better known_RP_ "point-to-point" source coding alternatives. Perhaps the simplest example of a network source code is the multiresolution code -- in which a single transmitter describes the same information to a family of receivers, each of whom receives the data at a different rate; the descriptions are embedded, so that all receivers receive the lowest-rate description, and each higher rate is achieved by adding on to the description at the nearest lower rate. In this talk, I will discuss rate-distortion bounds for lossy network source coding and algorithms for designing codes approaching these bounds. I will focus primarily on a survey of multiresolution source coding results but also include a brief discussion of generalizations to other network source coding environments. Results include rate-distortion bounds, rate-loss bounds, properties of optimal codes, and an approximation algorithm for optimal quantizer design. The quantizer design is based on a new approximation algorithm for $ell_2^2$ data clustering. Parts of the work described in this talk were done in collaboration with Hanying Feng, Dan Muresan, Qian Zhao, and Leonard Schulman.

September 21, 2004

“Listener-oriented Phonology”   Video Available

Paul Boersma, University of Amsterdam

[abstract]

Abstract

French has two kinds of vowel-initial words _LP_normal ones and so-called "h aspiré" words_RP_, which differ with respect to four phonological processes _LP_enchaînement, liaison, elision, and schwa deletion_RP_. I will show that a speaker-based view of phonology can handle at best three of these processes, and that a listener-oriented view can handle all of them. And this is just one example among many others, which I will touch upon briefly. A suitable framework that can be turned listener-oriented is Optimality Theory. Its speaker-based version _LP_Prince & Smolensky 1993_RP_ originally recognized two kinds of constraints: faithfulness constraints and constraints against marked structures. However, many ad-hoc constraints that do not fall in either group _LP_namely, what I call "exclamation constraints"_RP_ have been proposed through the years as well. The authors of such constraints often display a degree of dissatisfaction with their own proposals, usually because these constraints have little applicability outside the language under discussion. This is because the usual distal task of these constraints is to express the maintenance of a language-specific contrast. I will argue that these speaker-based exclamation constraints should be replaced with "listener-oriented faithfulness" constraints. Whereas a speaker-based faithfulness constraint reads "an element X that is present in the underlying form should appear as X in the surface form", a listener-oriented faithfulness constraint reads "an element X that is present in the underlying form should be pronounced as something that will be perceived as X by the listener". By replacing exclamation constraints with such faithfulness constraints, their formulations become unsurprising and non-ad-hoc. The empirical gain is that the limited applicability of these constraints _LP_namely, to cases of the maintenance of contrast_RP_ is now directly predicted by their inclusion in the faithfulness group.

September 28, 2004

“Joint discriminative language modeling and utterance classification”   Video Available

Brian Roark, OGI

[abstract]

Abstract

In this talk, I will describe several discriminative language modeling techniques for large vocabulary automatic speech recognition _LP_ASR_RP_ tasks. I will initially review recent work on n-gram model estimation using the perceptron algorithm and conditional random fields, with experimental results on Switchboard _LP_joint work with Murat Saraclar, Michael Collins and Mark Johnson_RP_. I will then present some new work on a call-classification task, for which training utterance classes are annotated along with the reference transcription. We demonstrate that a joint modeling approach, using utterance-class, n-gram, and class/n-gram features, reduces WER significantly over just using n-gram features, while additionally providing significantly more accurate utterance classification than the baselines. A variety of parameter update approaches will be discussed and evaluated with respect to both WER and classification error rate reduction, including simultaneous and independent optimization. As with the earlier n-gram modeling approaches, the resulting models are encoded as weighted finite-state automata and used by simply intersecting with word-lattices output from the baseline recognizer _LP_joint work with Murat Saraclar_RP_.

October 5, 2004

“Unsupervised learning of natural languages”   Video Available

Shimon Edelman, Cornell

[abstract]

Abstract

We describe an unsupervised algorithm capable of finding hierarchical, context-sensitive structure in corpora of raw symbolic sequential data such as text or transcribed speech. In the domain of language, the algorithm handles both artificial stochastic context-free grammar data and real natural-language corpora, including raw transcribed child-directed speech. It identifies candidate structures iteratively as patterns of partially aligned sequences of symbols, accompanied by equivalence classes of symbols that are in complementary distribution in the context of their patterns. Pattern significance is estimated using a context-sensitive probabilistic criterion defined in terms of local flow quantities in a graph whose vertices are the lexicon entries and where the paths correspond, initially, to corpus sentences. New patterns and equivalence classes can incorporate those added previously, leading to the emergence of recursively structured units that also support highly productive and safe generalization, by opening context-dependent paths that do not exist in the original corpus. This is the first time an unsupervised algorithm is shown capable of learning complex, grammar-like linguistic representations that are demonstrably productive, exhibit a range of structure-dependent syntactic phenomena, and score well in standard language proficiency tests.

October 12, 2004

“Towards a Grand Unified Theory of Underspecification”   Video Available

Alexander Koller, Universität des Saarlandes

[abstract] [biography]

Abstract

Underspecification is an approach to dealing with scope ambiguities, a certain class of semantic ambiguities in natural language. The basic idea is to derive from a syntactic analysis of a sentence not all the _LP_exponentially many_RP_ semantic representations, but one single compact description of all semantic representations. Then the actual semantic representations can be computed from the description by need. Underspecification has become the standard approach to dealing with scope in large-scale grammars. In my talk, I present one particular scope underspecification formalism, the language of dominance constraints. Dominance constraints have a particularly canonical definition _LP_as a logic interpreted over trees_RP_, and very efficient solvers are available for them. Then I investigate the relationship between dominance constraints and two other popular underspecification formalisms: Hole Semantics and Minimal Recursion Semantics. While the formalisms all look superficially similar, it turns out that there are fundamental differences once we look more closely. However, I show that significant fragments of the three formalisms are indeed equivalent, and present empirical data that suggests that these fragments encompass all descriptions that are used by current grammars. These results bridge the gap between different underspecification formalisms for the first time, which makes resources such as grammars and solvers that were created for one formalism available to the others. On a more general level, they also clarify the expressive power that a formalism actually has to offer in the linguistic application.

Speaker Biography

Alexander Koller is a researcher at Saarland University in Saarbruecken, Germany. He received his MSc degrees in computational linguistics and computer science from Saarland University in 1999, and plans to complete his PhD in computer science by the end of 2004. His research interests include the application of efficient algorithms and logic-based methods to natural language processing, computational semantics, automated text generation, and the language-robotics interface.

October 19, 2004

“Time Independent ICA through a Fisher Game”   Video Available

Dr. Ravi C. Venkatesan, Systems Research Corp

[abstract]

Abstract

Extreme Physical Information _LP_EPI_RP_ [1] is a self contained theory to elicit physical laws from a system/process _LP_Nature_RP_ based on a measurement-response framework. A specific form of the Fisher information measure _LP_FIM_RP_ known as the Fisher channel capacity _LP_FCC_RP_ is employed as a measure of uncertainty. The FCC is the trace of the FIM. EPI may be construed as being a zero-sum-game between a gedanken observer and a system under observation _LP_characterized by a demon, reminiscent of the Maxwell demon, residing in a conjugate space_RP_. The payoff of the competitive game results in a variational principle that defines the physical law that generates the observations made by the gedanken observer, as a consequence of the response of the system to the measurements. A principled formulation for reconstructing pdf’s from arbitrary discrete time independent random sequences based on an invariance preserving extension of the Extreme Physical Information _LP_EPI_RP_ theory, is presented [2, 3]. Invariances are incorporated into the invariant EPI _LP_IEPI_RP_ model through a Discrete Variational Complex inspired by the seminal work of T. D. Lee [4]. A quantum mechanical connotation is provided to the Fisher game. This is accomplished through the IEPI Euler-Lagrange equation that acquires the form of a time independent Schrödinger-like equation, and, the quantum mechanical virial theorem [5]. The concomitant constraints of the IEPI variational principle are consistent with the Heisenberg uncertainty principle. The ansatz’ describing the state estimators are obtained so as to selfconsistently satisfy an analog of the Fisher game corollary [1, 3]. The game corollary permits the demon to make the closing move in the Fisher game, by minimizing the FCC. This corresponds to a state of maximum uncertainty, and, is in keeping with the demon’s strategy of minimizing the information made available to the observer. A fundamental tenet of the EPI/IEPI model is the collection of statistically independent data by the observer. A principled IEPI Fisher game formulation guaranteeing the statistical independence of the quantum mechanical observables is presented, utilizing statistical analyses commonly employed in Independent Component Analysis _LP_ICA_RP_ [6]. Specifically, correlations are first eliminated using a whitening process _LP_facilitated by a linear filter or PCA_RP_, in conjunction with Givens rotation _LP_a unitary transform_RP_. Next, the IEPI Fisher game is played between the gedanken observer and the process inhabiting the conjugate system space. Finally, an inverse whitening filter is applied to the observables corresponding to the reconstructed state vectors obtained from the Fisher game. This yields a novel form of ICA based on minimizing the FCC. The prospect of obtaining an optimal whitening filter based on the Fisher game corollary is investigated into. Qualitative analogies and distinctions between the Fisher game ICA model and other prominent ICA theories are briefly discussed. Reconstruction of time independent random sequences generated from Gaussian mixture models demonstrates the efficacy of the Fisher game/ICA formulation. The utility of the Fisher game ICA formulation to achieve quantum clustering of data where a-priori knowledge of the number of clusters is unknown, is briefly discussed.

October 21, 2004

“Towards Semi-Supervised Algorithms for Semantic Relation Detection in BioScience Text”

Marti Hearst, Berkeley

[abstract] [biography]

Abstract

A crucial step toward the goal of automatic extraction of propositional information from natural language text is the identification of semantic relations between constituents in sentences. In the bioscience text domain, we have developed a simple ontology-based algorithm for determining which semantic relation holds between terms in noun compounds, and a supervised learning algorithm for discovering relations between entities. In this talk, I will first briefly describe these results. A major bottleneck for semantic labeling work is the development of labeled training data. To remedy this, we propose a new approach for creating semantically-labeled data that makes use of what we call *citances*: the text of the sentences surrounding citations to research articles. Citances provide us with differently-worded statements of approximately the same semantic information; by looking at the way that different authors talk about the same facts, we obtain paraphrases nearly for free. We have just begun to assess how well citances work for the creation of labeled training data for the problem of detecting protein-protein interaction relations. We also hypothesize that citances will be useful for synonym creation, document summarization, and database curation. Joint work with Preslav Nakov, Barbara Rosario, Ariel Schwartz, and Janice Hamer. This work is part of the BioText project, supported by NSF DBI-0317510.

Speaker Biography

Dr. Marti Hearst is an associate professor in SIMS, the School of Information Management and Systems at UC Berkeley, with an affiliate appointment in the Computer Science Division. Her primary research interests are user interfaces and visualization for information retrieval, empirical computational linguistics, and text data mining. She received BA, MS, and PhD degrees in Computer Science from the University of California at Berkeley, and she was a Member of the Research Staff at Xerox PARC from 1994 to 1997. Prof. Hearst is on the editorial boards of ACM Transactions on Information Systems and ACM Transactions on Computer-Human Interaction and was formerly on the boards of Computational Linguistics and IEEE Intelligent Systems, and was the program co-chair of HLT-NAACL

October 26, 2004

“Towards Automatic Acquisition of Ontological Knowledge”   Video Available

Patrick Pantel, ISI USC

[abstract] [biography]

Abstract

Recently, many corpus-based and web-based knowledge acquisition systems have been proposed for creating lexical resources. Not many attempts, however, have been made at ontologizing these resources. We present a semi-automatic method for extracting fine-grained semantic relations between verbs. We detect similarity, strength, antonymy, enablement, and temporal happens-before relations between pairs of strongly associated verbs using lexico-syntactic patterns over the Web. We provide the resource, called VerbOcean, for download at http://semantics.isi.edu/ocean/. We will discuss current work on ontologizing lexical resources like VerbOcean. Using an automatic algorithm, we assign a grammatical template to each node of an ontology. The challenge lies in disambiguating these templates. Benefits of this work potentially include the disambiguation of VerbOcean, the disambiguation of new conceptualizations, improved unsupervised word sense disambiguation, and the personalization of ontologies, like WordNet, to a particular domain.

Speaker Biography

Dr. Patrick Pantel is currently a Research Scientist in the Natural Language Group at the USC Information Sciences Institute where he does research in semi-automatic ontology construction, text mining, knowledge acquisition, and machine learning. In 2003, he received a Ph.D. in Computing Science from the University of Alberta in Edmonton, Canada. He is the recipient of various prestigious awards, including the University of Manitoba gold medal for the Faculty of Science, two national scholarships from the Natural Sciences and Engineering Research Council of Canada and the Izaak Walton Killam Memorial scholarship.

November 2, 2004

“Unsupervised Learning of Natural Language Structure”   Video Available

Dan Klein, Berkeley

[abstract] [biography]

Abstract

There is precisely one complete language processing system to date: the human brain. Though there is debate on how much built-in bias human learners might have, we definitely acquire language in a primarily unsupervised fashion. On the other hand, computational approaches to language processing are almost exclusively supervised, relying on hand-labeled corpora for training. This reliance is largely due to repeated failures of unsupervised approaches. In particular, the problem of learning syntax _LP_grammar_RP_ from completely unannotated text has received a great deal of attention for well over a decade, with little in the way of positive results. We argue that previous methods for this task have generally failed because of the representations they used. Overly complex models are easily distracted by non-syntactic correlations _LP_such as topical associations_RP_, while overly simple models aren rich enough to capture important first-order properties of language _LP_such as directionality, adjacency, and valence_RP_. We describe several syntactic representations which are designed to capture the basic character of natural language syntax as directly as possible. With these representations, high-quality parses can be learned from surprisingly little text, with no labeled examples and no language-specific biases. Our results are the first to show above-baseline performance in unsupervised parsing, and far exceed the baseline _LP_in multiple languages_RP_. These specific grammar learning methods are useful since parsed corpora exist for only a small number of languages. More generally, most high-level NLP tasks, such as machine translation and question-answering, lack richly annotated corpora, making unsupervised methods extremely appealing, even for common languages like English.

Speaker Biography

Dan Klein is an assistant professor of computer science at UC Berkeley, having recently completed his doctoral work at Stanford University. He holds a BA from Cornell University _LP_summa cum laude in computer science, linguistics, and math_RP_ and a masters in linguistics from Oxford University. Professor Kleins research focuses on natural language processing, including unsupervised grammar induction, statistical parsing methods, and information extraction. His academic honors include a British Marshall Fellowship, several graduate research fellowships, and best paper awards at the ACL and EMNLP conferences.

November 9, 2004

“Discriminative Learning of Generative Models”   Video Available

Tony Jebara, Columbia

[abstract] [biography]

Abstract

Generative models such as Bayesian networks, distributions, and hidden Markov models are elegant formalisms to setup and specify prior knowledge about a learning problem. However, the standard estimation methods they rely on, including maximum likelihood and Bayesian integration do not focus modeling resources on a particular input-output task. They only generically describe the data. In applied settings when models are imperfectly matched to real data, more discriminative learning as in support vector machines is crucial for improving performance. In this talk, I show how we can learn generative models optimally for a given task such as classification and obtain large margin discrimination boundaries. Through maximum entropy discrimination, all exponential family models can be discriminative via convex programming. Furthermore, the method handles interesting latent models such as mixtures and hidden Markov models. This is done via a variant of the maximum entropy that uses variational bounding on classification constraints to make computations tractable in the latent case. Interestingly, the method gives rise to Lagrange multipliers that behave like posteriors over hidden variables. Preliminary experiments are shown.

Speaker Biography

Tony Jebara is an Assistant Professor of Computer Science at Columbia University. He is Director of the Columbia Machine Learning Laboratory whose research focuses upon machine learning, computer vision and related application areas such as human-computer interaction. Jebara is also a Principal Investigator at Columbias Vision and Graphics Center. He has published over 30 papers in the above areas including the book Machine Learning: Discriminative and Generative _LP_Kluwer_RP_. Jebara is the recipient of the Career award from the National Science Foundation and has also recieved honors for his papers from the International Conference on Machine Learning and from the Pattern Recognition Society. He has served as chair or program committee member for various conferences including ICDL, ICML, COLT, UAI, IJCAI and on the editorial board of the Machine Learning Journal. Jebaras research has been featured on television _LP_ABC, BBC, New York One, TechTV, etc._RP_ as well as in the popular press _LP_Wired Online, Scientific American, Newsweek, Science Photo Library, etc._RP_. Jebara obtained his Bachelors from McGill University _LP_at the McGill Center for Intelligent Machines_RP_ in 1996. He obtained his Masters in 1998 and his PhD in 2002 both from the Massachusetts Institute of Technology _LP_at the MIT Media Laboratory_RP_. He is currently a member of the IEEE, ACM and AAAI. Professor Jebaras research and laboratory are supported in part by Microsoft, Alpha Star Corporation and the National Science Foundation.

November 23, 2004

“Coping with Information Overload”

Dr. Allen Gorin, US Department of Defense

[abstract]

Abstract

Coping with information overload is a major challenge of the 21st century. In previous eras, access to information was difficult and often tightly controlled as a source of power. Today, we are overloaded with so much electronic information that it has become an obstacle to effective decision making. Thus, the challenge facing individuals and institutions is how to embrace this information rather than being paralyzed by it. The intelligence community is overloaded with huge volumes of information, moving at large velocities and comprising great variety. Information includes both content and context, which humans deal with as a gestalt but computer systems tend to treat separately. We discuss two complementary approaches to coping with information overload and the open research questions that arise in this emerging discipline. First is value estimation, where humans examine only the golden nuggets of information judged valuable by some process. The second approach is knowledge distillation, where the information is digested and compressed, producing salient knowledge for human consumption. Finally, there are many open questions regarding the symbiosis between people and machines for knowledge discovery.

November 30, 2004

“Towards a Universal Framework for Tree Transduction”

Stuart Shieber, Harvard

[abstract] [biography]

Abstract

The typical natural-language pipeline can be thought of as proceeding by successive transformation of various data structures, especially strings and trees. For instance, low-level speech processing can be viewed as transduction of strings of speech samples into phoneme strings, then into triphone strings, finally into word strings. Morphological processes can similarly be modeled as character string transductions. For this reason, weighted finite-state transducers _LP_WFST_RP_, a general formalism for string-to-string transduction, can serve as a kind of universal formalism for representing low-level natural-language processes. Higher-level natural-language processes can also be thought of as transductions, but on more highly structured representations, in particular, trees. Semantic interpretation can be viewed as a transduction from a syntactic parse tree to a tree of semantic operations whose simplification to logical form can be viewed as a further transduction. Machine translation systems have been viewed as tree transductions of various sorts as well. This raises the question as to whether there is a universal formalism for natural-language tree transduction that can play the same role there that WFST plays for string transduction. In this talk, we explore this question, proposing that the characterization of classical tree transducers in terms of bimorphisms, little known outside the formal language theory community, can be used as a unifying framework for a wide variety of tree transduction formalisms, including, for instance, several previously proposed for statistical machine translation and the back-end formalism for Dragons speech command and control system. The framework also places so-called synchronous grammar formalisms into the tree transducer family for the first time.

Speaker Biography

Stuart Shieber is Harvard College Professor and James O. Welch, Jr. and Virginia B. Welch Professor of Computer Science in the Division of Engineering and Applied Sciences at Harvard University. Professor Shieber was awarded a Presidential Young Investigator award in 1991, and was named a Presidential Faculty Fellow in 1993, one of only thirty in the country in all areas of science and engineering. At Harvard, he has been awarded two honorary chairs: the John L. Loeb Associate Professorship in Natural Sciences in 1993 and the Harvard College Professorship in 2001. He was elected a Fellow of the American Association for Artificial Intelligence in 2004. He is the author or editor of five books and numerous articles in computer science. Professor Shieber holds eight patents, and is co-founder of Cartesian Products, Inc., a high-technology research and development company based in Cambridge, Massachusetts, providing advanced software technology to improve worldwide communication and information access. He is also the founder of Microtome Publishing, a company dedicated to publishing services in support of open access to the scholarly literature.

December 7, 2004

“Use of a perturbation-correlation method to measure the relative importance of different frequency bands for speech recognition”   Video Available

Christophe Micheyl, MIT

[abstract] [biography]

Abstract

In order to recognize speech, human listeners use cues distributed across different frequencies. Frequency-importance functions, which indicate the relative importance of different frequency bands for speech recognition, are an essential ingredient of predictive models of speech intelligibility, such as the articulation index. They can also be useful for optimizing multi-band speech-processing devices _LP_e.g., current hearing aids_RP_. Traditionally, frequency-importance functions have been assessed using low- and high-pass filtered speech. However, this approach has some limitations. An alternative approach, pioneered by Doherty and Turner _LP_J. Acoust. Soc. Am. 100, 1996_RP_, uses wide-band speech, to which random perturbations _LP_noise_RP_ are added independently in different bands. The importance of each band is then estimated based on the correlation between the signal-to-noise ratios applied successively in that band and the corresponding binary recognition scores, across thousands of trials. In this talk, I will review results obtained with this perturbation-correlation approach. In particular, I will show how the approach may be used to gain insight into the strategies used by listeners to recognize speech in different kinds of acoustic backgrounds _LP_noise versus competing speech_RP_. I will also address the question of inter-listener variability and the influence of hearing loss. Finally, I will describe my recent efforts to better understand the theoretical _LP_mathematical_RP_ basis of the perturbation-correlation method as applied to speech, in an attempt to improve it. [Work done in collaboration with Gaëtan Gilbert, CNRS UMR 5020, Lyon, France]

Speaker Biography

I obtained a PhD in Experimental and Cognitive Psychology from Lumiere University _LP_Lyon, France_RP_ in 1995. From 1996 to 1997 I was as a Research Associate in the Department of Experimental Psychology of Cambridge University _LP_Cambridge, UK_RP_, and a Visiting Scientist in the Medical Research Council Cognition and Brain Sciences Unit _LP_MRC-CBU_RP_. I worked there with Bob Carlyon and Brian Moore for a total of three years. After being offered a tenure position by the French Centre National de la Recherche Scientifique _LP_CNRS_RP_, I went back to Lyon for about three years. I came over to the US in 2001. After a short stay in Pr. Rauscheckers lab at Institute for Cognitive and Computational Sciences, Georgetwon University _LP_Washington, DC_RP_, I joined Andrew Oxenhams group in the Research Laboratory of Electronics, Massachusetts Institute of Technology _LP_Cambridge, MA_RP_, where I am currently a Research Scientist

Back to Top