by Jan Hajic and Martin Cmejrek
Welcome to the syntactic annotation lab. The goal of the lab is to
These pages will guide you through the process, starting with the English dependency grammar guidelines, and continuing to the actual lab exercise as prepared in the lab.
By linguistic wisdom and common sense (but often also just by convention) the direction of the dependency is determined as follows: the dependent is such a node from the two related nodes that, if left out, damages the grammaticality of the sentence the least (or perhaps not at all). (Apparently, leaving one node out usually changes the meaning of the sentence, but that's not our concern here - we are not actually leaving the node out but just determining the direction of the dependency.) For example, if you have a sentence "I like brown dogs", and want to determine the direction of dependency between brown and dogs, try saying "I like brown" and "I like dogs" - apparently the latter is better, so brown will be the dependent of dogs. The dependency direction of some tokens, primarily punctuation, particles, etc., for which it is hard to use the above test, is determined by convention.
The type of the dependency is annotated and recorded with the dependent (since it can have only one governor (parent), it is quite clear and well-defined which dependency it describes). We distinguish about 25 types of dependency, many of them being rather technical; there are only five main dependency types:
In the examples below, both the structure of the annotation as well as the appropriate functions are given. Remember, the function is in fact the name of the dependency type between anode and its governor, thus it cannot be determined for the roots of the example fragments (dashes are used instead) without knowing what they themselves depend on. Sometimes, however, the most typical function is used for the fragment's root.
Subjects are expressed usually by a "syntactic" noun (noun, pronoun, adjective used as a noun, or a numeral). Sometimes, even a whole clause can be a subject (subjective subordinate clause).
For more examples, see the subject nodes in the coordination examples below; see also the passive verb form example.
Direct as well as indirect objects (i.e., noun phrases modifying verbs without the use of a preposition) are a clear example of an object depending on its verb; however, prepositional phrases can become objects as well in certain typical verb constructions (especially when the preposition loses is typical, or "unmarked", function if it goes with the particular verb). As with all functions, subordinate clauses can also be objects.
Examples of objects expressed:
| by a simple noun phrase: | by an infinitive: |
by an infinitive (with "to be"): |
by a clause (and by a simple noun inside it): |
|
|
|
|
Attributes (function: Atr) can be expressed in several ways:
|
by an adjective (+ determiner): |
by a simple numeric expression: |
by a little more complex numeric expression: |
by a leading noun in a noun phrase: |
|
|
|
|
|
by a prepositional phrase: | by a possessive: | by a subordinate clause: | by a numeric range: |
by an -ing verb construction: |
|
|
|
|
|
Adverbials typically modify verbs and adjectives, but in certain cases they may modify "syntactic" nouns as well (e.g., "almost five").
They can be expressed in various ways:
| by an adverb: |
by a prepositional phrase (numeric, time): |
by a prepositional phrase (time, modifying a numeral): |
by a prepositional phrase (location): |
|
|
|
|
Negation is also annotated as an adverbial (similarly, intensifiers such as "also", "thus", "so", "only" etc. are Advs as well):
Subordinate clauses can also be adverbial (introduced by subordinate conjunctions such as "as", "when", "if", "where", etc.)
As double dependency is not desirable (because it would make the resulting structure not a tree), a convention is adopted that it is annotated as if it depends on the noun (that in turn depends on the verb in question) only. It usually introduces a "non-projectivity" in the structure (for those familiar with the parsing evaluation terminology, it is something like the "crossing brackets" phenomenon), but in the dependency framework, there is no trouble with that at all
It is often expressed by a transgressive verb form ("Suzan paid bills keeping some reserve..."):
Using dependencies, coordination can be handled easily by marking the coordinating conjunction (or a comma if there is no conjunction) by the function Coord and using it in place of any of the true dependents; they are then "dependent" on the conjunction's node, and marked accordingly as members of the coordination (similarly, for apposition the "governor" function is Apos, everything else being the same). Special care must be taken when the coordination is modified by a common phrase: such a phrase is obviously not marked as a coordination member. For marking the coordination/apposition "membership", appropriate function "suffix" (_Co, _Ap) is used.
Examples of coordinations/appositions of various complexity:
| Simple coordination: | Simple apposition: | Coordination with a common modification: |
|
|
|
| Another common modification: | Combination of coordination and apposition: |
|
|
| Simple passive: | Passive w/negation: | Infinitive particle: |
|
|
|
There are 20 sentences to annotate using the above guidelines. The sentences are real-world sentences from several articles from the WSJ from the late 80s and early 90s, manually selected to avoid those that are too difficult. They have been pre-processed in such a way that you will initially see a "string" of nodes once you start the annotation software: each node "depends" on the previous one. No functions are filled in. (This is exactly how the real annotators get the sentences for annotation.) Your task is to modify this inital structure so that it obeys the above guidelines.
Use the mouse to move nodes around to put them to the right places (by drag-and-drop): drop the node near what you think should be its governor, and it will be moved to the right place once you release the mouse button. Clicking on token's dependency function (light grey color, below the word form, initially ---) will bring up a window with a list of all possible dependency functions; select by double-clicking the chosen one.
Based on your team number, use one of the following machines:
| Team Number | Machine |
| 10 | e1 |
| 11 | e1 |
| 12 | e2 |
| 13 | e2 |
| 14 | e7 |
| 15 | e7 |
| 16 | e8 |
| 17 | e8 |
| 18 | e10 |
| 19 | e10 |
| 20 | e11 |
| 21 | e11 |
| 22 | e12 |
| 23 | e12 |
| 24 | e13 |
| 25 | e13 |
| 26 | e14 |
| 27 | e14 |
| 28 | e15 |
| 29 | e15 |
| 30 | e16 |
| 31 | e16 |
Login under the login name of one member of the team only; use one terminal and work together. We suggest that one of the team members searches the guidelines and the extra examples distributed on paper, comes up with solutions and then instructs the other one (sitting in front of the terminal, controlling the mouse and doing the final visual check) what to do. (Of course, you might want to switch places in the middle of the task.)
Do:
...$ cd ~hajic/lab/NN
...$ pwd
Please check VERY CAREFULLY (by using pwd as suggested above) that you are in a directory named NN (your team number)! There is no way at the moment to ensure that two teams do not annotate the same file!!! (resulting in the obvious consequences...)
Then run:
...$ annotate
The annotation software ("TrEd"'s) main white window should show up. Size it so that it has the maximum possible size while still seeing these guidelines (at least partially). You will see the first sentence (click to enlarge if you really want to check...):
Start working. Ask questions whenever you are not sure what to do. Save your work frequently (File->Save; do not use Save As... to save it under a different name - the result must be saved under your team's name, labNN.fs).
...$ evaluate
to see how well you have done; it will show you your total accuracy against the "gold standard", as well as the separate accuracies in structure building and function assignment. The total accuracy is the complement of the average of the two separate error counts, in percent:
Team NN accuracy: 18.78 (structure 21.89, functions 15.68)
The example above also shows the baseline numbers (i.e., this is what you get if you do not do anything and leave the inital structure and functions intact).
The final results for all the teams, ranked, will be available about a minute later; announcement of the winning team and its accuracy figures will follow immediately!
http://www.clsp.jhu.edu/ws2002/preworkshop/labs/cmejrek/lab.html
This page was originally been accessible at http://www.clsp.jhu.edu/~hajic/lab/index.html; graphics was in the same directory as *.jpg.