You do not have to write any code for this exercise!
Your system, which is in competition with those of the other teams, will consist of two sub-grammars:
One way to see what your grammar does is to generate
random sentences from it. For our purposes, generation is just
repeated symbol expansion. To expand a symbol such as NP, our
sentence generator will randomly choose one of your grammar's
NP -> ...
rules, with probability proportional to the rule's
weight.
Your task is to write the grammar rules and also choose the weight for each rule. The weights allow you to express your knowledge of which English phenomena are common and which ones are rare. By giving a low weight to a rule (or to S2 itself), you express a belief that it doesn't happen very often in English.
Another way to see what your grammar does is to parse sentences with it. You can use our parser to look at a sentence and figure out whether your grammar could have generated it -- and how, and with what probability. We say that your grammar predicts a sentence well if (according to the parser) your grammar has a comparatively high probability of generating exactly that sentence.
You will have to decide how much effort to expend on designing S1 and S2, and how much you will trust S1 relative to S2. Your goal is to describe English accurately. This means that your grammar (especially the S1 subgrammar) should be able to generate lots of syntactic constructions that show up in English. Moreover, it should have a higher probability of generating common sentences, and a low or zero probability of generating ungrammatical sentences.
The performance of your system will be measured in two ways:
A weighted context free grammar consists of:
START), and the terminal symbols as the words.
A derivation rule tells one way to rewrite a non-terminal symbol
into a sequence of non-terminal symbols and terminal symbols.
For example, S -> NP VP says that an S
(perhaps indicating a declarative sentence) can be rewritten as
an NP (noun phrase) followed by a VP.
A word about weights: Today you heard a lot about probability models. In a probabilistic CFG, each rule would have some probability associated with it, and the probability of a derivation for a sentence would be the product of all the rules that went into the derivation. We don't want you to worry about making probabilities sum up to one, so you can use any positive number as a weight. The normalization script we provide will take care of normalizing so that all rules with a given left-hand side will have probabilities that sum to one.
An additional note for the truly curious: the
normalization script norm-log normalizes the
weights, then takes their negative logarithms. This is because
the parser we provide works by adding the rule scores and
seeking a minimum. This is identical to multiplying the
probabilities and seeking a maximum.
In the file Vocab.gr, we are giving you the set of terminal
symbols (words), embedded in rules
of the form Tag -> word.
Note that the vocabulary is closed. There will
be no unknown words in the test text, and you are not allowed
to add any
words (terminal symbols) to the grammar.
We have given equal weights to all the Tag ->
word rules, but you are free to change the weights if
you like. You are also free to add, remove, or change these
rules, as long as you don't add any new words.
Tag -> word rules, but
you might find that the tags aren't useful when trying to extend
S1 into a bigger English grammar. So you are free to create new
tags for word classes that you find convenient or use the words
directly in the rules, if you find it advantageous. We tried to
keep the vocabulary relatively small to make it easier for you
to do this.
You will almost certainly want to change the tags in rules
of the form
Our method of enforcing this requirement is to use a grammar that
is effectively a bigram (or finite-state) model. Suppose we only have
two tag types,
Choosing the relative weight of these two rules is a gamble. If you are
over-confident in your
``real'' English grammar (S1), and you weight it too highly, then you
risk assigning very low probability to sentences generated by a team
whose grammar is more extensive (since the parser will have to resort
to your S2 to get a parse).
On the other hand, if you weight S2 too highly, then you will probably
do a poor job of predicting the other teams' sentences, since
S2 will not make any sentences
very likely (it accepts everything, so probability mass is spread
very thin across the space of word strings). Of course, you can
invest some effort in trying to make S2 a better n-gram model, but
that's a tedious task and a risky investment.
The second score evaluates your full grammar (S1+S2) and is much
more important. Here we will take the other teams' test sets
(removing the sentences that Xtag/the human judge couldn't parse) and try to
parse them with your grammar (the combination of S1 and
S2). You will be able to parse all of the sentences (because S2
will accept anything, albeit with low probability). The score
will be the cross-entropy of your model with respect to the test
set. A low cross-entropy is good; it means that your model
is unsurprised by the test sentences generated by the
other teams. We will report your model's cross-entropy on each
of the other teams' test sets.
If the test set of sentences is {s1, s2, s3, ...}, then your
model's cross-entropy is defined as
Note: p(s) denotes the probability that a randomly
generated sentence is exactly s. Often your grammar will have
many different ways to generate s, corresponding to many
different trees. In that case, p(s) is the total probability of
all those trees. However, for convenience, we approximate this
with the probability of the single most likely tree, because that
is what our parser happens to find. So we are really only
measuring approximate cross-entropy.
There are two file formats to know about. The first is for sentences
generated by a grammar, or to be parsed by a grammar. We name files in this
format with the suffix
The file format for the grammar itself is quite simple. We
name these files with the suffix
Be careful not to name any of your non-terminals the same as
any terminals.
From here, on It is expected that your grammar will be in multiple files. If you follow
our convention and name
all of these ending in
The parser wrapper takes care of normalizing the scores into
probabilities; you can see the result in the file called
If a sentence fails to parse, its output will be
You will also see warnings, such as words in the sentences that are
missing in your grammar.
If you want to make the parses a bit more readable, pipe the
output of If you want
to see how you're doing in terms of cross-entropy, pipe the output of
Note that this script is called automatically whenever you call
In addition, we
have generated examples of sentences you might want to try to design for.
They
are listed in the file called Misc ->
word. But be careful: you don't want to
change Misc -> goes
to VerbT ->
goes, since goes doesn't behave like
other
If you use bash: VerbT's. In particular, you want your S1
to generate Guinevere has the chalice .
but not Guinevere goes the chalice .,
which is ungrammatical. This is why you may want to invent
some new tags.
About S2
The goal of S2 is to enforce the intuition
that every string of words should have some (possibly
miniscule) probability. You can view it as a type of
smoothing of the probability model. There are a number of
ways to enforce the requirement that no string have zero
probability under S2; we give you one to start with, and you are
free to change it. Just note that your score will become
infinitely bad if you ever give zero probability to a sentence
(even if it's a crummy one generated by another team)!
A and B. The set of rules
we would allow in S2 would be:
S2 -> _A
This grammar can generate any sequence of
S2 -> _B
S2 ->
_A -> A
_A -> A _A
_A -> A _B
_B -> B
_B -> B _A
_B -> B _B
As and Bs,
and there is no ambiguity in parsing with it: there is always a
single, right-branching
parse.
Placing your bets
Two rules your grammar must include are START -> S1 and
START -> S2. The relative weight of these determines how
likely it is that S1 (with start symbol S1) or S2 (with
start symbol S2) would be selected in generating a sentence,
and how costly it is to choose one or the other when parsing a sentence.
Evaluation
To evaluate the precision of your S1 grammar, we will first
generate a test set of sentences from it using
randsent.
We will then make a grammaticality judgement about each sentence,
and your score will be the fraction of sentences judged to be
grammatical. The judgement will come either from a human or
from
the Xtag parser, which has access to a large and detailed English
grammar. Your score will be the fraction of sentences that
received at least one parse. (Because Xtag takes a while to parse
some sentences, and we want you to see how you did before dinner,
it's likely we'll use a human/grad student judge.)
average(
-log(p(s1)), -log(p(s2)), -log(p(s3)), ... )
Note that
-log(1)=0, -log(1/2)=1, -log(1/4)=2, -log(1/8)=3, ... -log(0)=infinity
So a high cross-entropy corresponds to a low probability and
vice-versa. You will be severely penalized for probabilities
close to zero.
Tools at Your Disposal
We have given you tools to parse sentences, to generate sentences
randomly according to your grammar, and to compute the cross-enropy
of your grammar with respect to a set of sentences. We've also given
you a tool to check for terminal symbols you may have accidentally
added to the grammar.
.sen . Simply put one sentence per
line, with spaces between tokens. Make sure you put spaces before and after
punctuation symbols, since, for example Arthur, will look like
one token to the parser, while Arthur , is what you want
(two tokens).
.gr . On a
given line, anything following # is an ignored
comment. Each rule goes on one line, in the format:
weight X Y1
Y2 ... Yn
signifying
the rule X -> Y1
Y2 ... Yn (where
X is a non-terminal and the Ys are
any mix of terminals and non-terminals).
.gr files refer to grammar files,
and .sen files refer to sentence files.
Earley Parser (and pre/post-processing scripts):
The parser is written in Lisp. We have provided you with the code
(so you can take a peek if you're inclined) and with a wrapper script.
To use the parser:
parse [-s start-symbol] sentence-file
grammar-files
The default start symbol is START; you can force your grammar
to attempt to parse according to S1 (and fail on sentences that don't parse with
S1) by using -s S1. If you replace sentence-file with
just a hyphen ( - ), sentences will be taken from standard
input, so you can just type them by hand. grammar-files should
of course be in .gr format.
.gr you can parse with something like:
parse - *.gr
which will take sentences as you type them in and use all your grammar
files.
GRAMMAR.nl, in which each score is replaced by
the negative logarithm of itself, after normalization.
NONE on a single line. If the parse is successful, you
will see the single best-scoring (highest-weight) parse on one line,
followed by the negative log-probability of the sentence according to
your model.
parse to prettyprint.
parse sentence-file *.gr | prettyprint
parse to crossent:
parse sentence-file *.gr | crossent
Random sentence generator:
The random sentence
generator is written in Perl. Run it using:
randsent [ -t ] [ -n num-sentences ]
[ -s start-symbol ] grammar-files
where -t
indicates that you want tree output instead of flat sentences,
num-sentences is the number of sentences you
want generated (default one), and
start-symbol is the top-level symbol you
want to start from (default S1).
So to generate 17 trees from your entire merged grammar (S1 and S2 combined),
and to make the trees ``pretty,'' you might run:
randsent -t -n 17 -s START *.gr | prettyprint
Checking for new vocabulary:
New vocabulary is strictly forbidden! Sometimes, though, you might
add a non-terminal on the right-hand side of a rule and forget to give
its expansion rules. Or you might forget exactly which words are in the
vocabulary. To check for new nonterminals:
check-for-new-terms grammar-files
This will list any un-expandable symbols (terminals) in your grammar that
are not in the vocabulary we've provided.
parse or randsent, so you will get constant
warnings until you fix the illegal terminals.
Keeping Up: Training Data
Throughout the exercise, we
will make more and more ``training data'' available to all the
teams. This data will come from S1-sentences generated by each
team's grammar.
You will be able to find this data in
/home/ws99/nasmith/ws02lab/data/. It will be updated
frequently, so check back often to keep on top of what
kinds of sentences the others are generating!
examples.sen in the start
directory.
These are just
suggestions; they aren't part of the evaluation. If you want to get more ideas
about how to stump the other teams, we suggest looking on the Web. It's not
hard to find complicated sentences.
Now Do It!
c{01..14}, dc01, dc02, dc03, dc12, s01, s03, s04, s{06..15}, s17, u{01..03}, u{05..16}, u18. (Note: machines running SunOS 5.8 are probably okay to use. These are s{21-36}.)
If you use tcsh:
setenv PATH /home/ws99/nasmith/ws02lab/bin:$PATH
export PATH=/home/ws99/nasmith/ws02lab/bin:${PATH}
Please use tcsh or bash for this exercise, not csh.
umask 000 /home/ws99/nasmith/ws02lab/. Someone
should copy the files you need to get
started: cp ../start/* ./
The script test.csh will do
this for you.
members that contains the
(complete) email addresses of everyone on your team, one per line. This
is so that we can mail you your parser's results in detail.
.gr)
files. A good place to start is by modifying Top.gr,
where the weights for S1 and S2, and the initial S1, are. We recommend
that you not change the files S2.gr and Vocab.gr,
but rather create new Tag -> word
rules in addition to the ones we've given you, in a different file. This
way you avoid messing up the S2 grammar (your safety net!).
You will want to use parse to see how well you can parse
training data from the other teams, and you will want to use
randsent to see if the sentences you are generating
look like English.
cat
together all your grammar files into one file called GRAMMAR.gr
so we can evaluate your model. Don't forget to include files
like S2.gr and Vocab.gr when you do this!