Note: the instructions contained on this page were written for use specifically in the CLSP WS02 Lab. They may refer to applications, files or other materials that are not accessible from other locations.

JHU NLP Workshop Parsing Lab - 11 July, 2002

Experiment: Beam Search Thresholding in a Probabilistic CKY Parser

The first experiment will introduce you to the basics of statistical parsing. The task will be to modify an existing parser written in Perl in a sequence of steps listed below. Knowledge of basic data types, control structures, and regular expressions in Perl is all you should need to do this lab.

Background. A recurring theme in probabilistic parsing is the need to limit the search space of the parser. Many productions that a hand-constructed CFG would rule out as impossible, a learned PCFG might permit with low but non-zero probability. Techniques to limit the search space might still guarantee that the optimal parse is found (such as A* search). Or they might do more aggressive pruning for even more speed, but sometimes fail to find the most probable parse. The Beam Search Thresholding you will implement is an example of this second, more aggressive pruning.

You will be taking an existing probabilistic context-free CKY parser and adding beam search thresholding to improve its running time. Beam search thresholding is a particular form of pruning which ignores items stored in the parser chart that have a very low probability, relative to other items in the same cell. Pruning these items is like making a bet that these items are so unlikely, they will not be in the most probable parse, even if they combine with other high-probability items. The ratio in probabilities from the most probable item in a cell to the lowest probability permitted for a "competing" item is the Beam Width, and this parameter can be tuned to trade off speed (a narrower beam can prune more items) vs. accuracy (pruning more items risks pruning the most probable parse).

Notes on data. The grammar was trained from the Penn Treebank of WSJ articles, sections 01-09, with no smoothing of the count statistics. The test sentences were drawn from section 0. To reduce the size of the grammar and parses, the words (lexical symbols) were removed, so that the parse tree "leaves" are part-of-speech (POS) tags. The full set of Penn Treebank POS and syntactic tags is below. While the full POS tagset is large to capture fine distinctions in meaning, the initial letter of the symbols generally maps to the standard simple grammatical categories.

The hand-annotated "gold" reference parses, and the output of the provided parser, use the standard nested parentheses to represent the parse structure. The evaluation of the quality of parses uses the "EVALB" tool from NYU, which reports the precision and recall of the brackets (the parse tree nodes and their spans) relative to the reference parse.

Notes on CKY algorithm. The CKY algorithm without pruning is exhaustive, guaranteed to find every possible parse of a sentence. There are several ways to order the triple-nested loops that test whether a cell in the chart can be built from two smaller cells. All of these orders must preserve the property of never trying to build an item before building all of the smaller items that it could be built from. The variant of CKY in the provided parser uses as its outer-most loop j, the right edge of the new item being built, incrementing it left-to-right across the words in the sentence. The next loop is on i, the left edge of the new item, decrementing it from j-1 back to 0. (The other direction would lose the above property.) And the next loop is on k, the split between the two smaller cells being examined. The remaining loops look at every combination of items from these two cells, and check the grammar for every rule with these symbols as a Right-Hand-Side (RHS).

The implementation has some additional code to handle unary rules, since the grammar used to annotate the treebank include many nodes with a single non-terminal as a RHS. (It's not in Chomsky Normal Form.)

You can work on this lab solo or in a two-person team. If Perl is unfamiliar, you might try to partner with someone who knows it better. You can find an online Perl reference here . Follow these steps:

  1. Copy all the files from ~ruhlen/parselab/files into a new directory. Test that the provided program can parse the sentences in the development set by running:

    perl cky-parser.pl ~ruhlen/parselab/out.gram < input.small > parser.out.nopruning

    Evaluate the accuracy of the best parses by running:

    ./test_output parser.out.nopruning gold.small

    This reports accuracy of the parser output and the labeled development set. Parsing these sentences with no pruning may take around 15 minutes, depending on your system speed and load. Note both the accuracy (precision and recall) and the total CPU time of the exhaustive CKY parser. You can get a qualitative look at how the parser's output differs from the reference parse by piping your output and the gold parses into "prettyprint".

    cat parser.out.nopruning gold.small | prettyprint

  2. While waiting for the development set parsing to finish, analyze the nested loops of the CKY algorithm in the file cky-parser.pl . Find the point in the program at which, for each cell i,j in the chart, all possible items have been created and stored.

  3. While writing and debugging your code, you should use the file "input.tiny" instead of "input.small".

    At the point you identified above, add a call to the provided function "prune_items" that measures the number of items stored at each chart cell i,j. Extend that function so that after the parser outputs its best parse for a sentence, you can output the mean, max, and min number of items per chart cell for this sentence, using statistics collected in "prune_items". Display these values using the format:
    "# Items/cell mean: X.X max: N min: N Total items: N cells: N"

  4. Extend the function to also compute the maximum and minimum log-probabilities of the items in each cell, and the log-ratio of max:min probabilities in that cell. (Hint: what's the direct way to calculate the log of a ratio of two probabilities already stored as log-probabilities?) Output another line after the Items/cell output in the format:
    "# Log Max:min probability log-ratios mean: X.X max: X.X min: X.X"
    using statistics collected in your function. (Note: each non-empty cell has a single max:min ratio, and this output shows the statistics for this value across all non-empty cells.)

  5. Modify the function prune_items further to use the flag variables $pruning_enabled and $beam, removing all items in each cell whose probability is less than that of the most probable item in the same cell divided by beta. (Again, remember we're in log space.) Add another output line for each sentence showing how many items were pruned, in the format:
    "# Items pruned total: N mean per cell: X.X"
    For a given sentence, should the total number of items in the chart using beam search be the same as the total number without any pruning minus the total number pruned? Why or why not? Look at your output for several sentences from this step and the previous step to confirm.

  6. Measure running times and error rates on the entire development data set with beam thresholding using several different beam widths. Include the beam values 0.0001 and 0.01.

    perl cky-parser.pl -p -b=0.0001 ~ruhlen/parselab/out.gram < input.small > parser.out.pruning.0.0001
    perl cky-parser.pl -p -b=0.01 ~ruhlen/parselab/out.gram < input.small > parser.out.pruning.0.01
    perl cky-parser.pl -p -b= ...

  7. Plot the change in parsing time, comparing the parser without pruning and the parser with pruning at the various beam widths. Use the following script to generate the output graph:

    perl plot_times.pl -g=/apps/bin/gnuplot parser.out.nopruning parser.out.pruning.0.0001 parser.out.pruning.0.01 ...

    Check the name of the output file: timing-xx.eps and view it using gv:

    /apps/bin/gv timing-xx.eps

  8. Check how the precision/recall figures change using the "test_output" script (usage explained above). Is there a "sweet spot" beam width that significantly reduces time with little change in precision/recall?

    Show either Paul or Anoop the timing graph and the precision/recall figures, and you're ready to move on the optional steps.

  9. Optional: As time allows, consider, implement, and test other ways to speed up parsing. Possible things to try:

Table 2: 
The Penn Treebank POS tagset 

1. CC  Coordinating conjunction  25.TO  to 
2. CD  Cardinal number           26.UH  Interjection 
3. DT  Determiner                27.VB  Verb, base form 
4. EX  Existential there         28.VBD Verb, past tense 
5. FW  Foreign word              29.VBG Verb, gerund/present participle 
6. IN  Preposition/subord.       30.VBN Verb, past participle 
218z     conjunction 
7. JJ  Adjective                 31.VBP Verb, non-3rd ps. sing. present 
8. JJR Adjective, comparative    32.VBZ Verb, 3rd ps. sing. present 
9. JJS Adjective, superlative    33.WDT wh-determiner 
10.LS  List item marker          34.WP  wh-pronoun 
11.MD  Modal                     35.WP  Possessive wh-pronoun 
12.NN  Noun, singular or mass    36.WRB wh-adverb 
13.NNS Noun, plural              37. #  Pound sign 
14.NNP Proper noun, singular     38. $  Dollar sign 
15.NNPS Proper noun, plural      39. .  Sentence-final punctuation 
16.PDT Predeterminer             40. ,  Comma 
17.POS Possessive ending         41. :  Colon, semi-colon 
18.PRP Personal pronoun          42. (  Left bracket character 
19.PP  Possessive pronoun        43. )  Right bracket character 
20.RB  Adverb                    44. "  Straight double quote 
21.RBR Adverb, comparative       45. `  Left open single quote 
22.RBS Adverb, superlative       46. "  Left open double quote 
23.RP  Particle                  47. '  Right close single quote 
24.SYM Symbol                    48. "  Right close double quote
       (mathematical or scientific) 
__________________
 Some symbols cannot be displayed in HTML format 
 
Table 3: 
The Penn Treebank syntactic tagset 

Tags

1.  ADJP     Adjective phrase 
2.  ADVP     Adverb phrase 
3.  NP       Noun phrase 
4.  PP       Prepositional phrase 
5.  S        Simple declarative clause 
6.  SBAR     Clause introduced by subordinating conjunction or
             0 (see below) 
7.  SBARQ    Direct question introduced by wh-word or wh-phrase 
8.  SINV     Declarative sentence with subject-aux inversion 
9.  SQ       Subconstituent of SBARQ excluding wh-word or wh-phrase 
10. VP       Verb phrase 
11. WHADVP   Wh-adverb phrase 
12. WHNP     Wh-noun phrase 
13. WHPP     Wh-prepositional phrase 
14. X        Constituent of unknown or uncertain category 

Null elements 
 
1.  *        ``Understood'' subject of infinitive or imperative 
2.  0        Zero variant of that in subordinate clauses 
3.  T        Trace---marks position where moved wh-constituent
              is interpreted 
4.  NIL      Marks position where preposition is interpreted in
             pied-piping contexts