Probabilistic Representations of Linguistic Meaning (PReLiM)
The workshop on Probabilistic Representations of Linguistic Meaning (PReLiM) was held in Prague from July 7-11, 2014, funded by the U.S. National Science Foundation's PIRE Program.
This 1-week workshop aims to gather leaders from the semantics, NLP, and cognitive science communities, to consider how linguistic semantics and pragmatics might integrate with probabilistic knowledge and reasoning (about the world and about one's interlocutor).
"Deep" natural-language understanding will eventually need more sophisticated semantic representations. What representations should the NLP community be using in 10 years? How will they figure into inference? How can we start to recover them from text or other linguistic resources?
Conversely, semanticists and pragmaticists need to model mental states and reasoning and how these relate to the linguistic form of speech acts. Is there an important role for probability distributions over semantic representations and within semantic representations?
Probabilistic representations are now standard across AI and cognitive science. Over 30 years, probabilistic models of language have grown beyond n-grams and collocations to incorporate more and more linguistic structure such as lexical categories, syntactic features, selectional preferences, semantic frames, and reference. We propose that it is time to turn the same lens on semantic representations, and integrate them with current thinking about probabilistic knowledge and reasoning.
Traditionally, it has been predicate logic that is used to encode the meaning of a reading of a sentence, and to reason about its entailments. Can probability distributions enrich our understanding of the underpinnings of linguistic meaning?
Knowledge of syntax includes knowledge of the frequencies of different constructions, and this knowlege is used in both generation and comprehension. Probabilistic reasoning is arguably even more important in reasoning about the meaning of a sentence and the inferences that are intended to be drawn from it.
Probability has several roles to play:
The intended meaning of a sentence may convey information about probability distributions. These describe either patterns in the world or the speaker's uncertainty about the world.
- Possible worlds: What inferences about the world or the speaker's beliefs can one draw from modal or counterfactual statements? Can the traditional notions of "accessible" worlds and "minimal" changes be made more precise by using probability?
- Concepts: Does a word evoke a probability distribution over prototypical entities or situations? How does prototypicality affect the interpretation of generics and indefinites? How does it compose, and how does it interact with truth conditions?
- Vagueness: What contrast set is intended by "tall," "expensive," or "many"? Can probability help us to interpret the meaning and compositional behavior of graded predicates?
Interpreting a sentence requires reasoning about what meaning was most plausibly intended (just as in statistical parsing). The space of possible meanings can be quite rich, and the reasoning can interact strongly with world knowledge.
- Sloppiness: When and how should one accommodate presuppositions, or coerce arguments to new types? How should one construe the domain of a quantifier? What alternative worlds or situations are evoked by a modal, counterfactual, or adverb of quantification? Where is the boundary between vagueness of the intended meaning and uncertainty about the intended meaning?
- Underspecification: For example, what is the precise relationship between the nouns in a noun-noun compound? What tripartite structure is intended by a generic? What temporal relations are implied by a sentence?
The form of expression may acknowledge uncertainty in the hearer's current belief state or in the common ground between speaker and hearer.
- Dynamic semantics: How does a speech act shift the hearer's distribution over states of the world?
- Inferrables: When is it appropriate to mark definiteness or givenness?
- Pragmatics: What inferences are being actively invited by the speaker, and how is this reflected in conventional form?
We will also discuss the residue of probabilistic methods. Probability may be an imperfect tool for modeling cognition. And even if mental states are essentially probability distributions, does language discuss mental states in these terms? Or is it some other, non-probabilistic "folk theory" of mind that is referenced by linguistic constructions such as evidentials, epistemic modals, and verbs of attitude and belief? Similarly, are folk theories of reasoning assumed by pragmatic conventions, causal language, or generic statements?
Emerging work in vector space semantic models also aims to address some of our questions by departing from conventional logical-form representations. However, our main focus here will be on probabilistic approaches such as distributions over possible worlds or situations.
Participants And Format
Our aim in the PReLiM workshop is to bring together leading thinkers from three communities:
- Natural language processing. The NLP community of late has been working actively on recovering Montagovian, frame-based, or distributional representations of meaning, as well as using such representations in tasks like question answering and machine translation.
- Linguistics. Linguists can clarify the range of semantic and pragmatic puzzles to be solved, and can challenge unwise methods. A few semanticists are already engaging with probabilistic methods.
- Probabilistic methods in cognitive science and AI. The Bayesian modeling community has been increasingly concerned with probabilistic reasoning, probabilistic knowledge representation, and distributions over possible worlds.
This short PReLiM workshop will immediately precede the longer AMR workshop on the practical use of abstract meaning representations in machine translation. Thus, in addition to the above invitees, PReLiM will include the AMR participants, particularly students. PReLiM will give them a broader view of the long-term problems in meaning representation, and will set the stage for their shorter-term discussions on how to design the next version of AMR.
- Background readings will be circulated in advance of the workshop.
- The workshop itself will consist of a mix of talks, panel presentations on prearranged topics, and breakout discussions.
- One goal of the discussions will be to try to agree on a high-level formal framework for considering issues like those above, one that is conducive to future exploration of specific phenomena and examples. Participants will be invited to bring proposals.
- Another goal is to identify avenues for empirical progress -- e.g., collecting useful data, running experiments on humans, or constructing dialogue systems for restricted domains.
- Discussion after the workshop can continue on a mailing list.
|Jason Eisner (organizer)||Johns Hopkins University|
|Oren Etzioni||University of Washington, Allen Institute|
|Shalom Lappin||King's College London|
|Staffan Larsson||University of Gothenburg|
|Dan Lassiter||Stanford University|
|Percy Liang||Stanford University|
|David McAllester||Toyota Technical Institute|
|James Pustejovsky||Brandeis University|
|Kyle Rawlins||Johns Hopkins University|
|Benjamin Van Durme||Johns Hopkins University|
|Nicholas Andrews||Johns Hopkins University|
|Drew Reisinger||Johns Hopkins University|
|Darcey Riley||Johns Hopkins University|
|Rachel Rudinger||Johns Hopkins University|
Most of the senior participants gave talks. The abstracts and videos can be found here.
- Morning talks: Overview talks and orientation for the larger workshop
- Afternoon discussion: Kick-off meeting
- Our goals and interests
- Our desiderata and warnings of pitfalls
- Afternoon talk: James Pustejovsky, Why It Is Important to Distinguish "Possible" From "Probable" Meaning Shifts: How distributions impact linguistic theory
- Morning discussion: Taking stock
- Hard examples we’d like to explain
- What’s already understood
- Morning talk: Shalom Lappin, A Rich Probabilistic Type Theory for the Semantics of Natural Language
- Afternoon discussion: Towards a probabilistic language of thought
- Knowledge representation
- Belief, theory of mind
- Metaphor and meaning shift
- Formalizing the above
- Chalktalk by Darcey/Jason on locally renormalized PCFG
- Afternoon talk: Oren Etzioni, Semantics, Science, and 10-Year-Olds
- Morning discussion: Worlds and situations
- Generics, quantifiers
- Modals, conditionals and counterfactuals; “minimal change”
- Chalktalk by Drew Reisinger on dialogue scenario
- Morning talk: Dan Lassiter, Bayesian Pragmatics
- Afternoon discussion: Pragmatics
- Meta-reasoning (chalktalk by Dan Lassiter)
- Presuppositions and implicatures
- Game theory
- Afternoon talk: David McAllester, The Problem of Reference
- Morning discussion: Linguisticization
- lexical semantics (also event semantics)
- linguistic marking (definiteness, information structure, modality, evidentials, classifiers, conventional implicatures)
- Morning talk: Staffan Larsson, Perceptual Semantics and Coordination in Dialogue
- Afternoon discussion: grounding
- temporal & spatial reasoning
- Afternoon talk: Percy Liang, The State of the Art in Semantic Parsing
- Morning discussion: Remaining difficult issues, e.g.,
- Imprecise language
- Contradictory beliefs
- Linguistic ambiguity about contrast sets
- Morning talk: Martha Palmer, Designing Abstract Meaning Representations for Machine Translation
- Afternoon discussion: Practical next steps towards semantic AI, e.g.,
- Chalktalk by Rachel Rudinger on Stanford dependencies?
- Chalktalk by Nick Andrews on web scraping scenario?
- Afternoon talk: Benjamin Van Durme, Common Sense and Language