CLSP Homepage : Workshop Homepage
Workshop 2003
Research Group Tuesday, May 13, 2008


Semantic Analysis Over Sparse Data

Team Goals
Project Description
Final Report (PDF)
 
Talks
Final Presentation (.ppt)
Attack on Data Sparseness -- a Tutorial (.ppt)
HLT, Data Sparsity and Semantic Tagging -- first day report (.ppt)
Progress report -- third week (.ppt)
Student Presentation -- Jerry Liu (.ppt)
Student Presentation -- Kris Haralambiev (.html)
Student Presentation -- Cassia Martin(.pdf | .sxi)
 
Team Members
Name E-Mail Affiliation
Guthrie, Louise *L.Guthrie@dcs.shef.ac.ukUniversity of Sheffield
Basili, Roberto **basili@info.uniroma2.itUniversity of Rome
Jelinek, Fredjelinek@jhu.eduJHU
Cunningham, Hamishhamish@dcs.shef.ac.ukUniversity of Sheffield
Zanzotto, Fabiozanzotto@info.uniroma2.itUniversity of Rome
Bontcheva, KalinaK.Bontcheva@dcs.shef.ac.ukUniversity of Sheffield
Guthrie, DavidD.Guthrie@dcs.shef.ac.ukUniversity of Sheffield
Macherey, Klauskmach@I6.informatik.rwth-aachen.deUniversity of Aachen
Cui, Jiacuijia@cs.jhu.eduJHU
Haralambiev, Kristiyan University of Sofia
Martin, Cassiacimartin@fas.harvard.eduHarvard
Holub, Martinholub@ufal.mff.cuni.czCharles University
Liu, Jerryjcl158@columbia.eduColumbia
Cammisa, Marcocammisa.marco@virgilio.itUniversity of Rome

* Group Leader
** Co-leader
 
General Information

Background Reading

Ellen Riloff
  1. Riloff, E. and Jones, R. (1999) "Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping" Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99) , 1999, pp. 474-479.
  2. Jones, R., McCallum, A,. Nigam, K., and Riloff, E. (1999) "Bootstrapping for Text Learning Tasks" (postscript, pdf) IJCAI-99 Workshop on Text Mining: Foundations, Techniques, and Applications
Introductory and general papers on Empirical NLP
  1. S. Abney, Statistical Methods and Linguistics, in The Balancing Act, Klavans J., Resnik P., MIT Press, 1996.
  2. E. Charniak, Statistical Techniques for Natural Language Parsing, AI magazine, 1997.
  3. K. Church, R.L. Mercer, Introduction to the Special Issue on Computational Linguistics Using Large Corpora, Computational Linguistics, volume 19, n. 1, 1993.
  4. L. Bahl, F. Jelinek, and R. Mercer. 1983.(The first LM model in speech) A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2):179-190, 1983.
About Semantics and Lexical Representation
  1. Y.Wilks, B. Slator, L. Guthrie, Electric Words: Dictionaries, Computers and Meanings, MIT Press, 1996. (Ch. 1-5)
  2. J. Pustejovsky, The Generative Lexicon, MIT Press, 1999.
  3. Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller. ``Introduction to WordNet: an on-line lexical database.'' In: International Journal of Lexicography 3 (4), 1990, pp. 235 - 244. ftp://ftp.cogsci.princeton.edu/pub/wordnet/5papers.ps
Word-Sense Disambiguation.
  1. D. Yarowsky, Word Sense Disambiguation, Using Statistical Models of Roget's Categories Trained on Large Corpora, Proceedings of Coling'92, Nantes, 1992.
  2. I. Dagan, A. Itai, Word-Sense disambiguation Using a Second Language Monolingual Corpus, Computational Linguistics, volume 20, n. 4, 1994.
  3. Yarowsky, D. 1995 Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge, MA, pp. 189-196, 1995
  4. Yorick Wilks. 1998. Is Word Sense Disambiguation Just One more NLP Task? (the difference of POS and word sense disambiguation) University of Sheffield, Computer Science Dept. Memoranda in Computer and Cognitive Science, CS-98-12. (1998)
  5. M. Stevenson, Y. Wilks., Large Vocabulary Word Sense Disambiguation. In Ravin, Y. and Leacock, C. (eds.) Polysemy: Theoretical and Computational Approaches. (2000)
Measures of semantic associations and other
  1. Peter F.Brown, Vincent J. Della Pietra, Peter V, deSouza, Jenifer C. Lai. 1992. Class-based n-gram models of natural language. Computational Linguistic, 18:467-479, 1992.
  2. Resnik,P. 1997. Selectional Preference and Sense Disambiguation.. In Proceedings of the ANLP Workshop ``Tagging Text with Lexical Semantics: Why What and How?''., Washington, DC.
  3. Philip Resnik, "Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language", Journal of Artificial Intelligence Research (JAIR), 11, pp. 95-130, 1999.
  4. S. Abney and M. Light. 1999. Hiding a semantic hierarchy in a markov model. in Proceedings of the Workshop on Unsupervised Learning in Natural Language Processing, ACL.,1999.
  5. M. Light, W. Greiff, Statistical Models for the induction and use of selectional preferences, in Cognitive Science, 87 (2002), 1-13.
  6. Agirre, Eneko, and David Martinez. ``Integrating selectional preferences in WordNet.'' In: Proceedings of the first International
  7. Agirre, Eneko, and David Martinez. ``Integrating selectional preferences in WordNet.'' In: Proceedings of the first International WordNet Conference, Mysore, India, 21-25 January 2002.
POS tagging and Parsing with statistical methods.
  1. K. Church, A Stochastic Parts program and Noun Phrase Parser for Unrestricted Text, Proceedings of ANLP, 1988, Austin Texas.
  2. Merrialdo, Tagging English Text with a Probabilistic Model, Computational Linguistics, volume 20, n. 2, 1994.
  3. A. Joshi, B. Srinivas, Integration of Structural and Statistical Information: the Role of Complexity of Primitives, Proceedings of New Methods for Language Processing, Manchester, 1994.
  4. Chelba, C. & Jelinek, F. 1998. Exploiting syntactic structure for language modelling. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada, August, pp. 225-231.
  5. Michael Collins. 1997. Three Generative, Lexicalised Models for Statistical Parsing. ACL 1997: 16-23
  6. Steven Abney. 1997. Stochastic Attribute-Value Grammars. Computational Linguistics 23(4): 597-618 (1997)
  7. R. Bod, Using an Annotated Corpus as a Stochastic Grammar, Proceedings of the EACL'97, Utrecht.
  8. N-gram Language Models from Sparse Data 8th ELSNET Summer School.(Implentation guide)
  9. Erik F. Tjong Kim Sang 2002 Memory-Based Shallow Parsing, Journal of Machine Learning Research, volume 2 (March), 2002, pp. 559-594.
Syntactic Disambiguation.
  1. D. Hindle, M. Rooths, Structural Ambiguity and Lexical Relations, Computational Linguistics, volume 19, n. 1, 1993.
Lexical Acquisition - Hybrid Methods.
  1. Pereira et al., Special Issue on Computational Linguistics Using Large Corpora, Computational Linguistics, volume 19, n. 1, 1993.
  2. Basili R., M.T. Pazienza, P. Velardi, An Empirical Approach to Natural Language Processing, Artificial Intelligence Journal, 1996.
Statistical Machine Translation.
  1. Brown P.F., Cocke J., Della Pietra A., Della Pietra V.J., Jelinek F., Lafferty J.D., Mercer R.L., Roossin P.S., A Statistical Approach to Machine Translation. Computational Linguistics, volume 16, n. 2, 1990.
Information Retrieval, Information Extraction
  1. Andrew K.McCallum, Dayne Freitag, Fernando Perera. Maximum Entropy for Information Extraction and Segmentation. ICML-2000.
  2. Kristie Seymore, Andrew McCallum, Ronal Rosenfel, 1999. Learning Hidden Markov Model Structure for Information Extraction. AAAI'99 Workshop on Machine Learning for Information Extraction
Other Papers on Statistical Language Modeling
  1. Gildea, D. and Thomas Hofmann, T. 1999. Topic Based Language Models Using EM. In Proceedings of 6th European Conference On Speech Communication and Technology (Eurospeech'99).
  2. J. Bellegarda 2001, Robustness in Statistical Language Modelling: Review and Prospectives. In J.C. Junqua and G. van Noords (eds.), Robustness in Languages and Speech Technology, 101-121. Kluwer Academic Publishers, 2001.
  3. Jelinek, F. Aspects of the Statistical Approach to Speech Recognition, 2001 IEEE International Symposium on Information Theory, Washington, D.C., June 29, 2001
  4. Extracting the Lowest-Frequency Words: Pitfalls and Possibilities 2001. Mark Weeber; Rein Vos; R. Harald Baayen Computational Linguistics Volume 26 Number 3 Pages 301 { 317, 2001.
  5. Probabilistic Top-Down Parsing and Language Modeling 2001. Brian Roark Computational Linguistics Volume 27 Number 2 Pages 249 { 285, 2001.
  6. Peng, Fuchun and Schuurmans, Dale 2001. Use of an Open/Closed Word Classes Factorization for N-gram Language Models To appear in Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS 2001), Nov. 2001, Tokyo, Japan.
  7. Niesler, T.R.; Woodland, P.C. 1996. Combination of word-based and category-based language models, 16
 

The Center for Language and Speech Processing
The Johns Hopkins University
3400 North Charles Street, Barton Hall
Baltimore, MD 21218
*Telephone: (410) 516-4237 *Fax: (410) 516-5050 *E-mail: clsp@clsp.jhu.edu