Learning, Reading, Teaching, Publishing FAQ
From CLSP Wiki
Up | Organization and Administration FAQ | Learning, Reading, Teaching, Publishing FAQ | Student Life FAQ | Computers, Computation, Software FAQ | Data FAQ
What are the weekly activities organized by CLSP?
What classes have CLSP students taught previously? Can I teach my own?
How, where, and when should I apply to get a summer internship?
What courses are required by the CLSP core curriculum?
CLSP aims to give its students a good interdisciplinary background. Our Ph.D. students are generally expected to take the following courses, as well as others that will foster their intellectual development in their chosen area.
Of course, if you have taken similar courses before coming to JHU, you don't necessarily have to take them again. Ultimately it's up to your advisor to approve a schedule consistent with the goals of this curriculum.
The core curriculum is currently as follows. The courses may be taken at any time, but it is healthy to take them as soon as possible.
- 2 semesters of computational methods for speech and NLP:
+ 600.465 Natural Language Processing (Eisner, fall)
+ 520.666 Extraction of Information from Speech and Text (Jelinek, spring)
- 2 semesters of linguistics -- ANY TWO of the following (Syntax I and III recommended):
+ 050.620 Syntax I
+ 050.621 Syntax II
+ 050.622? Syntax III
+ 050.625 Phonology
+ 050.607 Phonetics (permitted on a trial basis in Spring 2004)
- 2 semesters of prob/stats:
+ 550.420 Introduction to Probability (Fill, fall)
+ 550.430 Introduction to Statistics (Naiman, spring)
- 1 semester of computer programming:
+ 600.226 Data Structures (staff, fall/spring/summer)
(as always, students who have previously covered this material are exempt)
Here's a longer list of recommended courses. Many were nominated for the "required" list. Obviously, however, not all are required. And obviously, there are relevant courses not listed here.
NOTE: A similar but not identical list is at http://www.clsp.jhu.edu/admissions/courses.php; check it out.
- SPEECH
520.666 Extraction of Information from Speech and Text (Jelinek, spring) -- CORE COURSE
520.678 Automatic Speech Processing and Recognition (Byrne, fall)
520.435 Digital Signal Processing (Weinert, fall)
520.651 Random Signals (Prince, fall)
- NLP
600.465 Introduction to NLP (Eisner, fall) -- CORE COURSE
600.665 Statistical Language Learning (Eisner, alternate springs)
600.405 Finite State Methods (Eisner)
600.466 Information Retrieval & Web Agents (Yarowsky, spring)
600.765 Seminar in NLP (weekly reading group, 1 credit)
- LINGUISTICS
050.620 Syntax I: Intro to the Syntax of Natural Languages (staff) -- CORE COURSE
050.621 Syntax II: Syntactic Theory and Analysis (Frank, Legendre) -- CORE COURSE
050.622? Syntax III -- CORE COURSE
050.625 Phonology: Sound Structure in Natural Language (Burzio, Smolensky) -- CORE COURSE
Topical courses in semantics, morphology, phonetics, etc.
- MATHEMATICAL AND COMPUTATIONAL FOUNDATIONS - VARIOUS DEPARTMENTS
600.226 Data Structures (staff) -- CORE COURSE
550.420 Introduction to Probability (Fill, fall) -- CORE COURSE
550.430 Introduction to Statistics (Naiman, spring) -- CORE COURSE
520.447 Information Theory (Khudanpur, fall)
600.475 Machine Learning (Sheppard, fall)
520.774 Kernel Machine Learning (Cauwenberghs, alternate springs)
550.426 Stochastic Processes (Priebe, spring)
550.661 Foundations of Optimization (Han, fall)
520.419 Iterative Algorithms (Meyer, fall)
What does CLSP require me to do when I publish a paper?
Give CLSP a form signed by you and all your co-authors
CLSP Policy Regarding Authorship and Publication
(Nice pdf format: authorship_policy.pdf (92.2 K))
The following guidelines regarding authorship and publications have been issued by the WSE Office of Research.
A gradual diffusion of responsibility for multi-authored or collaborative
studies has led in recent years to the publication of papers for which no
single author was prepared to take full responsibility. Therefore two
safeguards are critical in the publication of accurate scientific reports:
a) the active participation of each coauthor in verifying any part of a
manuscript that falls within his or her specialty area and b) the
designation of one author who is responsible for obtaining coauthor
verification.
Authorship should be given generously, but only to those who have
contributed significantly to the research, are prepared to stand behind
their findings, and have reviewed the entire manuscript.
Any faculty member, postdoctoral fellow, or student who submits a
manuscript should ensure that all named authors consent to authorship prior
to submission of the manuscript. Each named author should be given a copy
of the manuscript at the time it is submitted. The lead author is
responsible for obtaining coauthor verification. The lead author should
prepare a copy of the title page of the manuscript, with a statement added
to the effect that everyone listed as an author has contributed to the
paper significantly, has reviewed the manuscript, and stands behind the
parts within his or her own area of expertise. Each listed author should
sign or affirm this statement in writing (e.g., via email). These
statements should be kept in the files of the department or center for same
period as original data is retained.
All publications should credit research findings appropriately by citing
relevant observations of others, as well as by recognizing the work and
input of all contributors in their own environments.
In keeping with these guidelines CLSP will adopt a policy that will govern all papers submitted for review and/or publication. Every paper co-authored by CLSP members will have a lead author and an executive author. If the lead author is from CLSP then he/she is also the executive author. If he/she is not from CLSP then the executive author is. The lead author is responsible for the entire process of submission for publication, the executive author for the adherence to these guidelines. The executive author is also responsible for determining which funding agencies should be acknowledged on the paper and ensuring that proper acknowledgement is included. This information can be determined by consulting with the Center Administrator. An example of proper acknowledgement is "This material is based upon work supported by the National Science Foundation under Grant No. XXXXX". The lead author is chosen by all co-authors. If the lead author is not from CLSP, the executive author is chosen by the CLSP co-authors.
All CLSP co-authors will sign the attached consent form which will be created by the executive author.
Along with the guidelines listed above, any CLSP faculty member, post doc or student identified as the executive author of a paper being submitted for review and subsequent publication will take the following steps:
1. Prior to sending out the first version for review, the executive author
will deliver to the CLSP administration a signed* consent by all CLSP
co-authors listed on the paper.
2. When the final version is ready for publication, the executive author
will assure that one hard copy and one electronic (cd) copy of the paper
is delivered to the administrative files of CLSP. If in the process of
review there has been a change in the co-author set or in the listing of
co-authors, a new signed consent form will replace the old one.
3. The procedure will be repeated for each separate submission for
publication.
- At the time of submission for review, the signed consent can be an email with the statement from the executive author and emails by the CLSP co-authors saying they affirm this statement. Prior to submission of the final version of the paper a written consent form will be generated by the executive author, signed by all CLSP co-authors, and delivered to CLSP administration.
The signed consent form confirms: (a) The title of the paper. (b) Who the lead and executive authors are. (c) The set of all co-authors. (d) That the co-authors accept responsibility for the content of the paper. (e) The order of co-authors listed under the title of the paper. (f) The name of the publication or conference to which the paper is being submitted.
Format should be as follows:
Title page of the manuscript with the following statement added:
The undersigned confirm that the persons listed below significantly contributed to this paper, have reviewed the manuscript and stand behind the parts within his/her own area of expertise. They further agree with the designation of the lead author, and with the sequence with which the authors are listed on the paper. The CLSP co-authors agree with the selection of the executive author.
Name of Lead author/affiliation
Name of Executive author (if different from Lead author)
Sequence of all co-authors (as listed on submitted paper) and affiliation
Name of co-author/affiliation . . .
Name of co-author/affiliation
Journal or publication to which paper is being submitted
Date of publication (if known):
Keep full documentation of your work
You must be able to explain and replicate your results in case anyone ever challenges them. This is also important in case you or someone else wishes to build on your work later (which is how research progresses).
Thus for each project, keep everything that you would need to quickly reconstruct your published results:
- Keep all programs and scripts you write, with comments explaining what they are doing at each step and why.
- Keep an "experimental logbook" (either a physical notebook or some kind of electronic record) where you record the experiments you run and their results. Including dates, the exact commands that you typed, and the versions of programs or corpora that you used.
- Keep all original data except where truly infeasible. If you feel you must delete data, consult with your advisor first. See the section on "Data Retention" in the Whiting School of Engineering's research rules: http://www.wse.jhu.edu/adr/pdf/WSE_Research_Rules.pdf.
Adhere to a high standard of scientific integrity
It goes without saying that your publications, like all of your research, must adhere to the highest standards of scientific integrity.
For example, you must report anything you know of that might cast doubt on your conclusions. And you should actively go to reasonable lengths to determine whether there is any such doubt. For example, if your results might have limited applicability and this might not be obvious to the reader, you should say so. You should consider and discuss possible alternative explanations of your results. And you should carry out appropriate tests for statistical significance.
Give credit appropriately
Make sure that in the paper, you appropriately acknowledge the support of the grants that have funded the work.
Acknowledgments typically go in a footnote attached to the paper title, or else in a special "Acknowledgments" section at the end. It is common for a funding agency to specify the exact language for you to include here.
Make sure that in the paper, you give appropriate credit to anyone who has contributed to the work. Depending on the size and nature of the contribution, "appropriate credit" might come in the form of co-authorship, acknowledgment in a footnote, or citation of a paper or other resource.
Post the paper on your personal webpage
If you are required to sign a copyright agreement, keep a copy. Try to avoid signing copyright agreements that prevent you from putting your paper online. Many publishers will not object (or even notice) if you edit a copyright agreement slightly before signing it. You can edit it to say that you retain the right to post the paper on your personal webpage and to electronic archives.
When the paper is published, post it on your personal webpage, together with bibliographic information. This ensures that interested parties will be able to discover, read, and cite your work. Include PDF format and perhaps other formats as well.
Most people like to post their papers as soon as the camera-ready copy is submitted. That ensures that you don't forget, and it means that you can point people to the paper, or they can find it themselves. It's pretty cool if you give a talk at a conference, and some people in the audience ask questions that show they have already taken the trouble to read your paper.
Eventually Google and Citeseer will find and index your online paper. But if you like, you can speed this up by pointing them to your webpage: http://www.google.com/addurl.html, http://citeseer.nj.nec.com/cs?adddoc=Yes.
You may also want to archive your paper at a permanent online archive such as http://arxiv.org/help/submit; this ensures that it will stay around forever, plus subscribers to the archive mailing list will be notified of your paper, which is good publicity.
Someone whose job it is to help you set up your webpage: Student Home Pages
What's that form I have to file when I publish a paper?
See Give CLSP a form signed by you and all your co-authors
What are the upcoming conferences and their deadlines?
See also What does CLSP require me to do when I publish a paper?|
Do you have a list of papers written by CLSP members?
What reading and discussion groups are active at CLSP?
We have the following two Reading Groups:
What relevant talks are coming up at JHU?
Every Friday at noon, we have the CLSP Student Seminar.
Each department lists its own seminar series on a webpage. Write to the folks listed below if you want to get on the email announcement lists.
- CLSP Seminars (Laura Graham)
- Cognitive Science Colloquia (Barbara Fisher)
- Computer Science Seminars(Jamie Lurz)
- Electrical and Computer Engineering Seminars (Gail O'Connor)
- Mathematical Sciences Seminars (Prof. Jong-Shi Pang)
- Psychology Seminars and Colloquia (Anne Daly)
- Biomedical Engineering Seminars
Is there a list of former students' dissertations as well as soft copies of them?
- All CLSP dissertations can be found in printed form in the CLSP library. It would be nice to have a list, but there is none yet. For soft copies, see the particular authors' web sites.
Is there an official Wiki for the whole NLP field?
- Yes, there is the ACL wiki.
- You can find information on NLP Acronyms, Blogs, Conferences and workshops, Competitions and challenges, Current events, Employment opportunities, postdoctoral positions, summer jobs, Grants, fellowships, scholarships, Journals, Newsgroups, mailing lists, Organizations, departments, institutions, groups, companies, associations, Other comprehensive sites, People, Research, Resources, Resources by language, Special interest groups, State of the art, and Teaching.
What are some good textbooks?
Here is an incomplete list:
- Jelinek (1998): Statistical Methods for Speech Recognition.
- Jurafsky & Martin (2000): Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. -- Partial draft available!
- Manning & Schütze (1999): Foundations of Statistical Natural Language Processing. -- This is also online (reachable from JHU campus)!
See also:
- Amazon Tags: Computational Linguistics
- Amazon Tags: Machine Learning
- Amazon Tags: Natural Language Processing
What are some influential papers in our field?
Please add some papers off the top of your head! This list is not to be taken too seriously, just a collection of papers people liked or think to be influential.
1970s
- Dempster et al (1977): Maximum Likelihood from Incomplete Data via the EM Algorithm.
1980s
1990s
- Berger et al (1996): A Maximum Entropy Approach to Natural Language Processing.
- Brill (1995): Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging.
- Brown et al (1990): A Statistical Approach to Machine Translation.
- Burges (1998): A Tutorial on Support Vector Machines for Pattern Recognition.
- Collins (1997): Three Generative, Lexicalised Models for Statistical Parsing.
- Ratnaparkhi (1996): A Maximum Entropy Model for Part-of-Speech Tagging.
2000s
- Chiang (2005): A Hierarchical Phrase-Based Model for Statistical Machine Translation.
- Lafferty et al (2001): Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.
Go ahead and add some more papers!
Where can I find papers online?
- Try Citeseer or Google Scholar. A good source is also the ACL Anthology.
Are there Firefox search plugins for Citeseer or the ACL Anthology?
- Yes, here is one for Citeseer, here is one for the ACL Anthology, and here is one for Google Scholar.
How can I organize the papers that I read or want to read?
- CiteULike is a very useful tool to organize and keep track of papers.
