Generation in the Context of MT

Let’s imagine a system for translating a sentence from a foreign language (say Arabic) into your native language (say English). Such a system works as follows. It analyzes the foreign-language sentence to obtain a structural representation that captures its essence, i.e. “who did what to whom where,” It then translates (or transfers) the actors, actions, etc. into words in your language while “copying over” the deeper relationship between them. Finally it synthesizes a syntactically well-formed sentence that conveys the essence of the original sentence.
Each step in this process is a hard technical problem, to which the best known solutions are either not adequate for applications, or good enough only in narrow application domains, failing when applied to other domains. This summer, we will concentrate on improving one of these three steps, namely the synthesis (or generation).

The target language for generation will be English, and that the source language to the MT system a language of a completely different type (Arabic and Czech). We will further assume that the transfer produces a fairly deeply analyzed sentence structure. The incorporation of the deep analysis makes the whole approach very novel – so far no large-coverage translation system has tried to operate with such a structure, and the application to very diverse languages makes it an even more exciting enterprise!

Within the generation process, we will focus on the structural (syntactic) part, assuming that a morphological generation module exists to complete the generation process, and will be added to the suite so as to be able to evaluate the final result, namely, the goodness of the plain English text coming out of the system. Statistical methods will be used throughout.

A significant part of the workshop preparation will be devoted to assembling and running a simplified MT system from Arabic/Czech to English (up to the syntactic structure level), in order to have realistic training data for the workshop project. As a consequence, we will not only understand and solve the generation problem, but also learn the mechanics of an end-to-end MT system, creating the intellectual preparation of team members to work on other parts of the MT system in the future.


Team Members
Senior Members
Jason EisnerCLSP
Bonnie DorrUniversity of Maryland
Jan HajicCharles University
Gerald PennUniversity of Toronto
Dragomir RadevUniversity of Michigan
Owen RambowUniversity of Pennsylvania
Graduate Students
Martin CmejrekCharles University
Yuan DingUniversity of Pennsylvania
Undergraduate Students
Terry KooStanford
Kristen PartonStanford

Center for Language and Speech Processing