Carolina Parada
PhD Candidate, Dept. of ECE
carolinap at jhu edu 

  About Me:

I am a 5th year PhD candidate at the Center for Language and Speech Processing, at The Johns Hopkins University. I am also affiliated to Human Language Technology Center of Excellence (HLTCOE). My advisor is Professor Frederick Jelinek.

My research interests include machine learning, speech recognition, natural language processing, and machine translation.

I received my B.S. and M.S. in Electrical Engineering from Washington State University in 2004 and 2006 respectively. I also had the opportunity to work in summer internships at the research groups in Nuance Communications, Summer 2007, Google Speech Group in NY, Summer 2008, and IBM Speech Group in NY, Summer 2009. My Curriculum Vitae is HERE.


  • I successfully defended my thesis and accepted a Research Scientist position at Google!!! My dissertation on Open Vocabulary Speech Recognition is here

  • I've been awarded the Google Fellowship in Speech !!!

  • Very sad news: my dear advisor and mentor Fred Jelinek passed away on September 14, 2010. I was incredibly fortunate to spend a little over four years as Fred's student. He was a brilliant researcher who pioneered many of the methods used today in Speech Recognition. Here is an article on his life [pdf] and work [pdf] in his own words. He was also an amazing advisor and human being. I am currently working with Prof. Hynek Hermansky and Prof. Mark Dredze.

  Research Interests:

Large Vocabulary Continuous Speech Recognition (LVCSR, ASR), Natural Language Processing (NLP), Machine Learning, Out-of-Vocabulary detection, Spoken Term Detection

My thesis work focused on improving the quality of ASR transcriptions in the presence of out-of-vocabulary (OOV) terms. OOVs are an important source of error in current Large Vocabulary Speech Recognition systems. These words cause recognition failures, which propagate through pipeline systems, degrading performance significantly in downstream applications like spoken term detection, translation, document retrieval, etc. In my research, we aim to enhance robustness to out-of-vocabulary words by advancing the state-of-the-art in:

  • Detecting OOV regions in the output of the LVCSR system
  • Spoken term detection for OOVs
  • Investigating novel sub-word representations for open-vocabulary speech recognition.


  • Carolina Parada, Mark Dredze, and Fred Jelinek. OOV Sensitive Named Entity Recognition in Speech," [to appear] Interspeech, 2011.[pdf]

  • Carolina Parada, Mark Dredze, Abhinav Sethy, and Ariya Rastrow. Learning Sub-Word Units for Open Vocabulary Speech Recognition," ACL, 2011.[pdf]

  • Ciprian Chelba, Johan Schalkwyk, Thorsten Brants, Vida Ha, Boulos Harb, Will Neveitt, Carolina Parada, and Peng Xu, "Query Language Modeling for Voice Search" in Proc. IEEE-SLT, 2010. [pdf]

  • Carolina Parada, Abhinav Sethy, Mark Dredze, and Frederick Jelinek, "A Spoken Term Detection Framework for Recovering Out-of-Vocabulary Words Using the Web" in Proc. Interspeech, 2010. [pdf]

  • Carolina Parada, Mark Dredze, Denis Filimonov, and Frederick Jelinek, "Contextual Information Improves OOV detection in Speech," in Proc. NAACL, 2010. [pdf]

  • Carolina Parada, Abhinav Sethy, and Bhuvana Ramabhadran, "Balancing False Alarms and Hits in Spoken Term Detection," in Proc. ICASSP, 2010.[pdf]

  • Carolina Parada, Abhinav Sethy, and Bhuvana Ramabhadran, "Query-by-example spoken term detection for OOV terms," in Proc. ASRU, 2009.[pdf]

  • Carolina Parada, "A collection of observations on data-rate-limited control," Thesis (M.S.) Washington State University, 2006.
  Patents Pending:

  • Compounded Text Segmentation. GP-1910-00-US, 16113-1503001.
  Work Experience:  (Curriculum Vitae)

During my summers as a gradudate student I've had the oppportunity to intern at:

  • IBM Speech Research Group, NY (Summer 2009).
    Worked with Abhinav Sethy and Bhuvana Ramabhadran.
    Proposed a novel approach for Query-by-Example Spoken Term Detection and Spoken Term Detection of textual OOV queries. Implemented an FST-based Indexing System.
  • Google Speech Research Group, NY (Summer 2008)
    Worked with Boulos Harb and Johan Schalkwyk.
    Developed a technique for automatically segmenting compounded text using large statistical language models. Investigated a framework for building hierarchical language models for ASR.
  • Nuance Communications Research Group (Summer 2007)
    Worked with Puming Zhan
  • Support Viterbi forced-realignment in the speaker adaptation modules (C/C++/Perl/Python)


I have had the priviledge to collaborate with amazing researchers, here are some of them:

Hub4 Named Entity Recognition (Email for data): A collection of 40hrs of Hub4 data, reference was manually-labeled by two people using MTurk. The audio has restricted access through LDC.

Learning sub-words units [coming-soon]: Source code for an un-supervised segmentation approach described in:
Carolina Parada, Mark Dredze, Abhinav Sethy, and Ariya Rastrow. Learning Sub-Word Units for Open Vocabulary Speech Recognition," ACL, 2011.[pdf]

  Other Interests:

Love traveling, hiking. and dancing. I had the opportunity to spend three months in Prague, Czech Republic in Fall 2008. I was working on my research at the Institute of Formal and Applied Linguistics UFAL in Charles University, Prague.

Contact me at:
The Center for Language and Speech Processing
The Johns Hopkins University
CSEB 321
3400 North Charles Street
Baltimore, MD 21218
* Telephone: (410) 516-7231 * Fax: (410) 516-5050 * E-mail: carolinap at jhu dot edu
Carolina Parada -- JHU Center for Language and Speech Processing