Archived Seminars by Year

Show all Seminars     Only show seminars with video

2005

January 25, 2005

“Reducing Confusions and Ambiguities in Speech Translation”   Video Available

Pascale Fung, Hong Kong University of Science & Technology

[abstract] [biography]

Abstract

I will introduce some of our research on improving speech translation, by reducing acoustic-phonetic confusions in accented speech, named entity recognition from speech, and semantics-based translation disambiguation. Accent is a major factor in causing acoustic-phonetic confusions for spontaneous Mandarin speech. Named entity extraction is needed to facilitate the understanding of “What, Who, When, Where, Why†contained in the speech. Translation disambiguation is essential for translation accuracy. More importantly, translation disambiguation with frame semantics helps decode the meaning of a spoken query even if the recognition is not perfect. We believe our approach will bring marked improvement to speech translation performances.

Speaker Biography

http://www.clsp.jhu.edu/seminars/slides/S2005/Fung.pdf

February 1, 2005

“NLP Research for Commercial Development of Writing Evaluation Capabilities”   Video Available

Jill Burstein, ETS

[abstract] [biography]

Abstract

Automated essay scoring was initially motivated by its potential cost savings for large-scale writing assessments. However, as automated essay scoring became more widely available and accepted, teachers and assessment experts realized that the potential of the technology could go way beyond just essay scoring. Over the past five years or so, there has been rapid development and commercial deployment of automated essay evaluation for both large-scale assessment and classroom instruction. A number of factors contribute to an essay score, including varying sentence structure, grammatical correctness, appropriate word choice, errors in spelling and punctuation, use of transitional words/phrases, and organization and development. Instructional software capabilities exist that provide essay scores and evaluations of student essay writing in all of these domains. The foundation of automated essay evaluation software is rooted in NLP research. This talk will walk through the development of CriterionSM, e-rater, and Critique writing analysis tools, automated essay evaluation software developed at Educational Testing Service.

Speaker Biography

Jill Burstein is a Principal Development Scientist at Educational Testing Service. She received her Ph.D. in Linguistics from the City University of New York, Graduate Center. The focus of her research is on the development of automated writing evaluation technology. She is one of the inventors of e-rater®, an automated essay scoring system developed at Educational Testing Service. She has collaborated on the research and development of capabilities that provide evaluative feedback on student writing for grammar, usage, mechanics, style, and discourse analysis for CriterionSM, a web-based writing instruction application. She is co-editor of the book “Automated Essay Scoring: A Cross-Disciplinary Perspective.â€Â

February 8, 2005

“From Phrase-Based MT towards Syntax-Based MT”   Video Available

David Chiang, University of Maryland

[abstract] [biography]

Abstract

We explore two ways of extending phrase-based machine translation to incorporate insights from syntax.

Speaker Biography

David Chiang is a postdoctoral researcher at the University of Maryland Institute for Advanced Computer Studies. He received his PhD in Computer and Information Science from the University of Pennsylvania in 2004, under the supervision of Aravind K. Joshi. His research interests are in applying formal grammars to a variety of areas, including statistical machine translation, statistical natural language parsing, and biological sequence analysis.

February 15, 2005

“Progress toward the LIFEmeter: Epidemiology meets Speech Recognition”   Video Available

Thomas Glass, Johns Hopkins School of Public Health

[abstract] [biography]

Abstract

Dr. Glass is Associate professor of Epidemiology at the Bloomberg School of Public Health. He is broadly trained in social science and holds a Ph.D. in Medical Sociology from Duke University. He completed post-doctoral training in epidemiology at Yale School of Medicine. He has been on the faculty of the Yale School of Medicine, Harvard School of Public Health and the Johns Hopkins Bloomberg School of Public Health. Dr. Glass is primarily interested in understanding the impact of social and behavioral factors on health and functioning in late life. His previous work has explored the role of social support, social networks and social engagement on outcomes ranging from stroke recovery, to alcohol consumption and dementia risk. He teaches, directs graduate students and conducts research in social epidemiology. In addition to observational studies, he has done intervention studies to improve function in older. More recently, his work has centered on unraveling the impact of factors in the built and social environment of urban neighborhoods on functioning. He oversees the Baltimore Neighborhood Research Consortium _LP_BNRC_RP_ at Johns Hopkins. Among his current projects, Dr. Glass is leading a team to develop integrated sensor technology that will improve the measurement of social, physical and cognitive function for use in population studies.

Speaker Biography

http://www.clsp.jhu.edu/seminars/slides/S2005/Glass.pps

February 22, 2005

“Boundaries to the influence of Animates.”   Video Available

Annie Zaenen, PARC

[abstract]

Abstract

The talk reports on recent work in corpus analysis done at Stanford. The studies aim at determining the weight of various factors that influence the choice of syntactic paraphrases. More specifically I concentrate on the influence of animacy in the dative alternation and in left-dislocation and topicalization and discuss current models of language production in the light of our quantitative results. The talk will end with a short discussion of the relevance of these findings for NL generation.

March 1, 2005

“Multi-Rate and Variable-Rate Accoustic Modeling of Speech at Phone Syllable and Time Scales”

Ozgur Cetin, ICSI/Berkeley

[abstract] [biography]

Abstract

In this talk we will describe a multi-rate extension of hidden Markov models _LP_HMMs_RP_, multi-rate coupled HMMs, and present their applications to acoustic modeling for speech recognition. Multi-rate HMMs are parsimonious models for stochastic processes that evolve at multiple time scales, using scale-based observation and state spaces. For speech recognition, we use multi-rate HMMs for joint acoustic modeling of speech at multiple time scales, complementing the usual short-term, phone-based representations of speech with wide modeling units and long-term temporal features. We consider two alternatives for the coarse scale in our multi-rate models, representing either phones, or syllable structure and lexical stress. We will also describe a variable-rate sampling extension to the basic multi-rate model, which tailors the analysis towards temporally fast-changing regions and significantly improves over fixed-rate sampling. Experiments on conversational telephone speech will be presented, showing that the proposed multi-rate approaches significantly improve recognition accuracy over HMM- and other coupled HMM-based approaches _LP_e.g. feature concatenation and multi-stream coupled HMMs_RP_ for combining short- and long-term acoustic and linguistic information. This is a joint work with Mari Ostendorf of University of Washington.

Speaker Biography

Ozgur Cetin is a post-doctoral researcher at the International Computer Science Institute, Berkeley. He has received PhD and MS degrees from University of Washington, Seattle in 2005 and 2000, respectively, both in electrical engineering, and a BS degree from Bilkent University, Turkey in 1998 in electrical and electronics engineering. His research interests include machine learning, and speech and language processing.

March 8, 2005

“Interaction: Conjectures, Results, Myths”   Video Available

Dina Goldin, University of Connecticut

[abstract] [biography]

Abstract

Computer technology has shifted from mainframes to locally networked workstations and now to mobile wireless devices. Software engineering has evolved from procedure-oriented to object-oriented and component-based systems. Al has refocused from logical reasoning and search algorithms to intelligent agents and partially observable environments. These parallel changes exemplify a conceptual paradigm shift from algorithms to interaction. Interactive computational processes allow for input and output to take place during the computation, in contrast to the traditional "algorithmic" computation which transforms predefined input to output. It had been conjectured _LP_Wegner 1997_RP_ that "interaction is more powerful than algorithms". We present Persistent Turing Machines _LP_PTMs_RP_ that serve as a model for sequential interactive computation. PTMs are multitape Turing Machines _LP_TMs_RP_ with a persistent internal tape and dynamic stream-based semantics. We formulate observation-based notions of system equivalence and computational expressiveness. Among other results, we demonstrate that PTMs are more expressive than TMs, thus proving Wegners conjecture. As an analogue of the Church-Turing Thesis which relates Turing machines to algorithmic computation, it is hypothesized that PTMs capture the intuitive notion of sequential interactive computation. We end by considering the historic reasons for the widespread misinter- pretation of the Church-Turing Thesis, that TMs model all computation. The myth that this interpretation is equivalent to the original thesis is fundamental to the mainstream theory of computation. We show how it can be traced to the establishment of computer science as a discipline in the 1960s.

Speaker Biography

Dina Q. Goldin is a faculty member in Computer Science & Engineering at the University of Connecticut and an adjunct faculty member in Computer Science at Brown University. Dr. Goldin obtained her B.S. in Mathematics and Computer Science at Yale University, and her M.S. and Ph.D. in Computer Science at Brown University. Her current topics of research are models of interaction and database queries.

March 29, 2005

“Translingual Information Processing”

Salim Roukos, IBM TJ Watson Research

[abstract]

Abstract

Searching unstructured information in the form of _LP_largely_RP_ text with increasing image, audio, and video content is fast becoming a daily activity for many people. Increasingly, the content is becoming multilingual _LP_e.g. one such trend is that non-english speakers became the majority of online users in the summer of 2001 and continue to increase their share reaching two-thirds today_RP_. To help assist users with accessing answers to their information needs regardless of the original language of the relevant content, we at IBM Research have a number of projects to handle multilingual content ranging from machine translation, information extraction, to topic detection and tracking. In this talk, we will present an overview of our work on statistical machine translation and demonstrate a cross-lingual search engine to search Arabic content using English queries.

April 5, 2005

“Bipartite Graph Factorization in Static Decoding Graphs with Long-Span Acoustic Context: An Interesting Combinatorial Problem in ASR”   Video Available

Geoffrey Zweig, IBM TJ Watson Research

[abstract] [biography]

Abstract

A key problem in large vocabulary speech recognition is how to search for the word sequence with the highest likelihood, given an acoustic model, a language model, and some input audio data. There are two standard approaches to doing this: 1_RP_ to construct the search space “on-demand†so as to represent just the portions that are reasonably likely given the data, and 2_RP_ to construct ahead-of-time a full representation of the entire search space. This paper identifies and solves a problem that arises in the construction of a full representation of the search space when long span acoustic context is used in the acoustic model, specifically when the expected acoustic realization of a word depends on the identity of the preceding word. In this case, a sub-portion of the search space contains a bipartite graph with O_LP_V_RP_ vertices and O_LP_V^2_RP_ edges, where V is the vocabulary size. For large vocabulary systems, the number of edges is prohibitive, and we tackle the problem of finding an edge-wise minimal representation of this sub-graph. This is done by identifying complete bipartite sub-graphs within the graph, and replacing the edges of each such sub-graph with an extra vertex and edges that connect the left and right sides of the sub-graph to the new vertex. The problem of finding the smallest such representation is NP-hard, and we present a heuristic for finding a reasonable answer. The talk concludes with some experimental results on a large-vocabulary speech recognition system and a discussion of related problems.

Speaker Biography

Geoffrey Zweig received his PhD in Computer Science from the University of California at Berkeley in 1998, after which he joined IBM at the T.J. Watson Research Center. At IBM, Geoffrey manages the advanced large vocabulary speech recognition research group. His responsibilities include the development of improved acoustic modeling techniques and state-of-the-art decoding algorithms. Geoffrey was a PI for the DARPA EARS program and organized IBM’s participation in the 2003 and 2004 evaluation. His research interests revolve around the application of machine learning techniques such as boosting and Bayesian Network modeling to speech recognition, as well as highly practical applications such as the automated construction of grammars for directory dialer applications. In addition to speech recognition, Geoffrey has worked on a wide variety of topics, including extremely large scale document clustering for the web, and DNA physical mapping. Geoffrey is a member of the IEEE and an associate editor of the IEEE Transactions on Speech and Audio Processing.

April 12, 2005

“Old and new work in discriminative training of acoustic models”   Video Available

Daniel Povey, IBM TJ Watson Research Center

[abstract]

Abstract

I will give a general review on the subject of discriminative training of acoustic models, with special emphasis on MPE. I will then describe fMPE, which is a more recent discriminative training technique developed at IBM. It is a feature-space transformation that is trained by maximizing the MPE criterion.

April 12, 2005

“Speech, Language, & Machine Learning”   Video Available

Jeff A. Bilmes

[abstract]

Abstract

CLSP "Speech, Language, & Machine Learning" Jeff A. Bilmes University of Washington, Seattle April 1, 2005 Running Time: 1 Hour 10 Min 35 Sec Dub-Date:  4/4/05

April 19, 2005

“New Directions in Robust Automatic Speech Recognition”

Richard Stern, Carnegie Mellon University

[abstract]

Abstract

As speech recognition technology is transferred from the laboratory to the marketplace, robustness in recognition is becoming increasingly important. This talk will review and discuss several classical and contemporary approaches to robust speech recognition. The most tractable types of environmental degradation are produced by quasi-stationary additive noise and quasi-stationary linear filtering. These distortions can be largely ameliorated by the "classical" techniques of cepstral high-pass filtering _LP_as exemplified by cepstral mean normalization and RASTA filtering_RP_, as well as by techniques that develop statistical models of the distortion _LP_such as codeword-dependent cepstral normalization and vector Taylor series expansion_RP_. Nevertheless, these types of approaches fail to provide much useful improvement in accuracy when speech is degraded by transient or non-stationary noise such as background music or speech. We describe and compare the effectiveness of techniques based on missing-feature compensation, multi-band analysis, feature combination, and physiologically-motivated auditory scene analysis toward providing increased recognition accuracy in difficult acoustical environments.

April 26, 2005

“Machine Translation Performance Evaluation Based on DOD Standards”   Video Available

Doug Jones, MIT

[biography]

Speaker Biography

Doug Jones is a Linguist on the technical staff in the Information Systems Technology Group at MIT Lincoln Laboratory _LP_MIT Ph.D. on Hindi Syntax, Stanford AB/AM specializing in computational linguistics_RP_. He was employed for four years as a Senior Scientific Linguist at NSA, focusing on machine translation of low density languages. His current area of active research focuses on evaluating human language technology applications such as machine translation and speech recognition in terms of enhancing human language processing skills. A list of recent papers can be found at the Information Systems Technology home page at MIT Lincoln Laboratory has additional information and links: http://www.ll.mit.edu/IST/pubs.html

May 3, 2005

“Conversational Speech and Language Technologies for Structured Document Generation”

Juergen Fritsch, MModal

[abstract] [biography]

Abstract

I will present Multimodal Technologies AnyModal CDS, a clinical documentation system that is capable of creating structured and encoded medical reports from conversational speech. Set up in form of a back-end service oriented architecture, the system is completely transparent to the dictating physician and does not require active enrollment or changes in dictation behavior, while producing complete and accurate documents. In contrast to desktop dictation systems which essentially produce a literal transcript of spoken audio, AnyModal CDS attempts to recognize the meaning and intent behind dictated phrases, producing a richly structured and easily accessible document. In the talk, I will discuss some of the enabling speech and language technologies, focusing on continuous semi-supervised adaptation of speech recognition models based on non-literal transcripts and on combinations of statistical language models and semantically annotated probabilistic grammars for the modeling and identification of structure in spoken audio.

Speaker Biography

Dr. Jürgen Fritsch is co-founder and chief scientist of Multimodal Technologies _LP_M*Modal_RP_ where he leads research efforts in the fields of speech recognition and natural language processing. He has an extensive background in speech and language technologies and in advancing the state-of-the-art in these areas. He held research positions at the University of Karlsruhe, Germany, and at Carnegie Mellon University, Pittsburgh where he was participating in the LVCSR/Switchboard speech recognition evaluations. Before co-founding M*Modal, Jürgen was co-founder of Interactive Systems Inc, where he was instrumental in the design and development of an advanced conversational speech recognition platform that continuously evolved into one of the foundations of M*Modals current line of products. Jürgen received his Ph.D. and M.Sc. degrees in computer science from the University of Karlsruhe, Germany.

July 14, 2005

“Introduction to Arabic Natural Language Processing - Part 1”   Video Available

Nizar Habash, Columbia University

July 20, 2005

“An Information State Approach to Collaborative Reference”   Video Available

Matthew Stone, Rutgers University

July 21, 2005

“Introduction to Arabic Natural Language Processing - Part 2”   Video Available

Nizar Habash, Columbia University

October 4, 2005

“Progress in speaker adaptation and acoustic modeling for LVCSR”   Video Available

George Saon, IBM

[abstract] [biography]

Abstract

This talk is organized in two parts. In the first part, we discuss a non-linear feature space transformation for speaker/environment adaptation which forces the individual dimensions of the acoustic data to be Gaussian distributed. The transformation is given by the preimage under the Gaussian cumulative distribution function _LP_CDF_RP_ of the empirical CDF for each dimension. In the second part, we review some existing techniques for precision matrix modeling such as EMLLT and SPAM and we describe our recent work on discriminative training of full covariance Gaussians on the 2300 hours EARS dataset.

Speaker Biography

http://www.clsp.jhu.edu/seminars/slides/F2005/CLSP Seminar Slides 2005-10-24 - Saon, George - Progress in speaker adaptation and acoustic modeling for LVCSR.pdf

October 11, 2005

“Making Visualization Work”   Video Available

Ben Bederson, University of Maryland

[abstract] [biography]

Abstract

The human visual system is incredibly powerful. Many people have tried to create computer systems that present information visually to take advantage of that power. The potential is great - for tasks ranging from detecting patterns and outliers to quickly browsing and comparing large datasets. And yet, the number of successful visualization programs that we use today is limited. In this talk, I will discuss common problems with visualizations, and how several approaches that we have developed avoid those problems. By applying Zoomable User Interfaces, Fisheye distortion, carefully controlled animation, and working closely with users, we have created a range of applications which we have shown to have significant benefits. I will show demos from application domains including photos, trees, graphs, and even digital libraries. To build these visualizations, we have built Piccolo, a general open source toolkit available in Java and C#. It offers a hierarchical scene graph in the same style that many 3D toolkits offer - but for 2D visualization. By offering support for graphical objects, efficient rendering, animation, event handling, etc., we Piccolo can reduce the effort in building complex visual applications with minimal run-time expense. In this talk, I will also discuss Piccolo, alternative approaches to building visualizations - and the computational expense of using Piccolo.

Speaker Biography

Benjamin B. Bederson is an Associate Professor of Computer Science and director of the Human-Computer Interaction Lab at the Institute for Advanced Computer Studies at the University of Maryland, College Park. His work is on information visualization, interaction strategies, digital libraries, and accessibility issues such as voting system usability.

October 20, 2005

“Human-Like Audio Signal Processing”

David V. Anderson, Georgia Institute of Technology **New Date & Time**

[abstract] [biography]

Abstract

The discipline of signal processing provides formal, mathematical techniques for processing information. The applications of signal processing are countless and make up an increasing amount of all computing performed across all computing platforms. Problems and opportunities are arising, however, that will be met as we learn more about how neurological systems process information. The first problem stems from the evolution of computing devices. For a variety of reasons, high performance computing platforms must employ increasing parallelism to achieve performance improvements. The problem comes from the difficulty in effectively using highly parallel systems. The second problem comes from the need to decrease the power consumption of computing systems. This is true for everything from large systems, where power determines density and cooling costs, to small systems where battery life is the limiting factor. A third problem is inherent in the difficult, ongoing task of making machines intelligent. All of these problems may be addressed by applying techniques learned through the study of neurological systems. These systems are highly parallel and operate very efficiently. Additionally, the intelligence of biological systems is largely due to the ability of these systems to recognize patterns--an ability that greatly exceeds that of synthetic systems in robustness and flexibility. This talk will discuss the problems mentioned above as well as summarizing some recent applications of signal processing that have benefited from the inspiration or modeling of neurological systems.

Speaker Biography

David V. Anderson received the B.S and M.S. degrees from Brigham Young University, Provo, UT and the Ph.D. degree from Georgia Institute of Technology _LP_Georgia Tech_RP_ Atlanta, GA, in 1993, 1994, and 1999, respectively. He is an associate professor in the School of Electrical and Computer Engineering at Georgia Tech and an associate director of the Center for Research in Embedded Systems Technology. His research interests include audio and psycho-acoustics, signal processing in the context of human auditory characteristics, and the real-time application of such techniques using both analog and digital hardware. His research has included the development of a digital hearing aid algorithm that has now been made into a successful commercial product. Dr. Anderson was awarded the National Science Foundation CAREER Award for excellence as a young educator and researcher in 2004 and is a recipient of the 2004 Presidential Early Career Awards for Scientists and Engineers _LP_PECASE_RP_. He has over 60 technical publications and 5 patents/patents pending. Dr. Anderson is a member of the IEEE, the Acoustical Society of America, ASEE, and Tau Beta Pi. He has been actively involved in the development and promotion of computer enhanced education and other education programs.

October 25, 2005

“On the Parameter Space of Lexicalized Statistical Parsing Models”   Video Available

Dan Bikel, IBM

[abstract] [biography]

Abstract

Over the last several years, lexicalized statistical parsing models have been hitting a "rubber ceiling" when it comes to overall parse accuracy. These models have become increasingly complex, and therefore require thorough scrutiny, both to achieve the scientific aim of understanding what has been built thus far, and to achieve both the scientific and engineering goal of using that understanding for progress. In this talk, I will discuss how I have applied as well as developed techniques and methodologies for the examination of the complex systems that are lexicalized statistical parsing models. The primary idea is that of treating the "model as data", which is not a particular method, but a paradigm and a research methodology. Accordingly, I take a particular, dominant type of parsing model and perform a macro analysis, to reveal its core _LP_and design a software engine that modularizes the periphery_RP_, and I also crucially perform a detailed analysis, which provides for the first time a window onto the efficacy of specific parameters. These analyses have not only yielded insight into the core model, but they have also enabled the identification of "inefficiencies" in my baseline model, such that those inefficiencies can be reduced to form a more compact model, or exploited for finding a better-estimated model with higher accuracy, or both.

Speaker Biography

Daniel M. Bikel graduated from Harvard University in 1993 with an A.B. in Classics--Greek & Latin. After spending a post-graduate year at Harvard taking more courses in computer science, engineering and music, Bikel joined Ralph Weischedels research group at BBN in Cambridge, MA. During his three years there, Bikel developed several NLP technologies, including Nymble _LP_now called IdentiFinder_LP_tm_RP__RP_, a learning named-entity detector. In 2004, Bikel received a Ph.D. from the Computer and Information Science Department at the University of Pennsylvania _LP_advisor: Prof. Mitch Marcus_RP_. At Penn, he focused on statistical natural language parsing, culminating in a dissertation entitled identically to this talk. Bikel is currently a Research Staff Member at IBMs T. J. Watson Research Center in Yorktown Heights, NY.

November 1, 2005

“Quest for the Essence of Language”   Video Available

Steven Greenberg, Centre for Applied Hearing Research, Technical Univ of Denmark; Silicon Speech, Santa Venetia, CA, USA

[abstract] [biography]

Abstract

Spoken language is often conceptualized as mere sequences of words and phonemes. From this traditional perspective, the listeners task is to decode the speech signal into constituent elements derived from spectral decomposition of the acoustic signal. This presentation outlines a multi-tier theory of spoken language in which utterances are composed not only of words and phones, but also syllables, articulatory-acoustic features and _LP_most importantly_RP_ prosemes, encapsulating the prosodic pattern in terms of prominence and accent. This multi-tier framework portrays pronunciation variation and the phonetic micro-structure of the utterance with far greater precision than the conventional lexico-phonetic approach, thereby providing the prospect of efficiently modeling the information-bearing elements of spoken language for automatic speech recognition and synthesis.

Speaker Biography

In the early part of his career, Steven Greenberg, studied Linguistics, first at the University of Pennsylvania _LP_A.B._RP_ and then at the University of California, Los Angeles _LP_Ph.D._RP_. He also studied Neuroscience _LP_UCLA_RP_, Psychoacoustics _LP_Northwestern_RP_ and Auditory Physiology _LP_Northwestern, University of Wisconsin_RP_. He was a principal researcher in the Neurophysiology Department at the University of Wisconsin-Madison for many years, before migrating back to the "Golden West" in 1991 to assume directorship of a speech laboratory at the University of California, Berkeley, where he also held a tenure-level position in the Department of Linguistics. In 1995, Dr. Greenberg migrated a few blocks further west to join the scientific research staff at the International Computer Science Institute _LP_affiliated with, but independent from UC-Berkeley_RP_. During the time he was at ICSI, he published many papers on the phonetic and prosodic properties of spontaneous spoken language, as well as conducted perceptual studies regarding the underlying acoustic _LP_and visual_RP_ basis of speech intelligibility. He also developed _LP_with Brian Kingsbury_RP_ the Modulation Spectrogram for robust representation of speech for automatic speech recognition as well as syllable-centric classifiers of phonetic features for speech technology applications. Since 2002, he has been President of Silicon Speech, a company based in the San Francisco Bay Area, that is dedicated to developing future-generation speech technology based on principles of human brain function and information theory. Beginning in 2004, Dr. Greenberg has also been a Visiting Professor at the Centre for Applied Hearing Research at the Technical University of Denmark where he performs speech-perception-related research.

November 8, 2005

“Syntactic Models of Alignment”   Video Available

Dan Gildea, University of Rochester

[abstract] [biography]

Abstract

I will describe work on tree-based models for aligning parallel text, presenting results for models that make use of syntactic information provided for one or both languages, as well as models that infer structure directly from parallel bilingual text. In the second part of the talk, I will discuss some theoretical aspects of Synchronous Context Free Grammars as a model of translation, describing a method to factor grammars to lower the complexity of synchronous parsing.

Speaker Biography

Dan Gildea received a BA in linguistics and computer science, as well as an MS and PhD in computer science, from the University of California, Berkeley. After two years as a postdoctoral fellow at the University of Pennsylvania, he joined the University of Rochester as an assistant professor of computer science in 2003.

November 15, 2005

“Integrative Models of the Cardiac Ventricular Myocyte”   Video Available

Raimond Winslow, Johns Hopkins University

[abstract]

Abstract

Cardiac electrophysiology is a field with a rich history of integrative modeling. A particularly important milestone was the development of the first biophysically-based cell model describing interactions between voltage-gated membrane currents, pumps and exchangers, and intracellular calcium _LP_Ca2+_RP_ cycling processes in the cardiac Purkinje fiber _LP_DiFrancesco & Noble, Phil. Trans. Roy. Soc. Lond. B 307: 353_RP_ and the subsequent elaboration of this model to describe the cardiac ventricular myocyte action potential _LP_Noble et al. Ann. N. Y. Acad. Sci. 639: 334; Luo, C-H and Rudy, Y. Circ. Res. 74: 1071_RP_. This talk will review the “state-of-the-art†in integrative modeling of the cardiac myocyte, focusing on modeling of the ventricular myocyte because of its significance to arrhythmia and heart disease. Special emphasis will be placed on the importance of modeling mechanisms of Ca2+-Induced Ca2+-Release _LP_CICR_RP_. CICR is the process by which influx of trigger calcium _LP_Ca2+_RP_ through L-Type Ca2+ channels _LP_LCCs_RP_ leads to opening of ryanodine sensitive Ca2+ release channels _LP_RyRs_RP_ in the junctional sarcoplasmic reticulum _LP_JSR_RP_ membrane and release of Ca2+ from the JSR. It is of fundamental importance in cardiac muscle function, as it not only underlies the process of muscle contraction, but is also involved in regulation of the cardiac action potential. We will demonstrate that every model of CICR in use today has serious shortcomings, and we will offer insights as to how these shortcomings must be addressed in order to develop reconstructive and predictive models that can be used to investigate myocyte function in both health and disease. _LP_Supported by NIH HL60133, the NIH Specialized Center of Research on Sudden Cardiac Death P50 HL52307, the Whitaker Foundation, the Falk Medical Trust, and IBM Corporation_RP_

November 29, 2005

“Prosody in Spoken Language Processing”

Izhak Shafran, Johns Hopkins University

[abstract]

Abstract

Automatic speech recognition is now capable of transcribing speech from a variety of sources with high accuracy. This has opened up new opportunities and challenges in translation, summarization and distillation. Currently, most applications only extract the sequence of words from a speakers voice and ignore other useful information that can be inferred from speech such as prosody. Prosody has been studied extensively by linguists and is often characterized in terms of phrasing _LP_break indices_RP_, tones and emphasis _LP_prominence_RP_. The availability of a prosodically labeled corpus of conversational speech has spurred renewed interest in exploiting prosody for downstream applications. As a first step, an automatic method is needed to detect prosodic events. For this task, we have investigated the performance of a series of classifiers with increasing complexity, namely, decision tree, bagging-based classifier, random forests and hidden Markov models of different orders. Our experiments show that break indices and prominence can be detected with accuracies above 80%, making them useful for practical applications. Two such examples were explored. In the context of disfluency detection, the interaction between the prosodic interruption point and the syntactic EDITED constituents were modeled with a simple and direct model -- a PCFG with additional tags. The preliminary results are promising and show that the F-score of the EDIT constituent improves significantly without degrading the overall F-measure significantly. The task of building elaborate generative models is difficult, largely, due to the lack of an authoritative theory on syntax-phonology interface. An alternative approach is to incorporate the interaction as features in a discriminative framework for parsing, speech recognition or metadata detection. As an example, we illustrate how this can be done in a sentence boundary detection using a re-ranking framework and show improvements on a state-of-the-art system. The work reported in this talk was carried out in the 2005 JHU workshop and previously at University of Washington in collaboration with several researchers.

December 6, 2005

“Strategies for Coreference from the Perspective of Information Exploitation”   Video Available

Breck Baldwin, Alias-i, Inc.

[abstract]

Abstract

Coreference entices us with the promise of radically improved information exploitation via data mining, search and information extraction. Coreference in its canonical form involves equating text mentions of Abu Musab al-Zarqawi with mentions in Arabic, phone calls which reference him, images that contain him. Once such a foundation of coreference is established over a body of information, questions like "get me all individuals with some relation to al-Zarqawi" become feasible. It also is a dynamite research problem. Progress has been made in text mediums with apparently excellent results in named entity recognition, pronoun resolution, cross document individual resolution and database linking. This suggests that some sort of Uber-search/indexing engine should fall out the bottom of a series of 90% f-measure results in these key areas. Unfortunately, this is not the case and for good reasons. In this talk I will argue that there are fundamental flaws in how we think about coreference in the context of information access. The argument ranges from basic philosophical issues about what an entity or an ontology is to an analysis of why first-best approaches to entity detection hobble performance in significant ways. As a proposed strategy for approaching the problem I will discuss our own efforts two directions: 1_RP_ Targeting known entities using match filtering as well as n-best driven analysis with character language models, and 2_RP_ targeting unknown entities with n-best chunking approaches to named entity extraction as opposed to first-best approaches commonly used.

Back to Top