Let's imagine a system for translating a sentence from a foreign language
(say Arabic) into your native language (say English). Such a system works
as follows. It analyzes the foreign-language sentence to obtain a
structural representation that captures its essence, i.e. "who did what to
whom where," It then translates (or transfers) the actors, actions, etc.
into words in your language while "copying over" the deeper relationship
between them. Finally it synthesizes a syntactically well-formed sentence
that conveys the essence of the original sentence.
Each step in this process is a hard technical problem, to which the best
known solutions are either not adequate for applications, or good enough
only in narrow application domains, failing when applied to other domains.
This summer, we will concentrate on improving one of these three steps,
namely the synthesis (or generation).
The target language for generation will be English, and that the source
language to the MT system a language of a completely different type
(Arabic and Czech). We will further assume that the transfer produces a
fairly deeply analyzed sentence structure. The incorporation of the deep
analysis makes the whole approach very novel - so far no large-coverage
translation system has tried to operate with such a structure, and the
application to very diverse languages makes it an even more exciting
enterprise!
Within the generation process, we will focus on the structural (syntactic)
part, assuming that a morphological generation module exists to complete
the generation process, and will be added to the suite so as to be able to
evaluate the final result, namely, the goodness of the plain English text
coming out of the system. Statistical methods will be used throughout.
A significant part of the workshop preparation will be devoted to
assembling and running a simplified MT system from Arabic/Czech to English
(up to the syntactic structure level), in order to have realistic training
data for the workshop project. As a consequence, we will not only
understand and solve the generation problem, but also learn the mechanics
of an end-to-end MT system, creating the intellectual preparation of team
members to work on other parts of the MT system in the future.