Scott E. Novotney
PhD Candidate, Dept. of Computer Science


  About Me:
My work with transcribing Egyptian revolution audio tweets made it to the Language Log.

I am interested in low-resource automatic speech recognition. Current state-of-the-art LVCSR systems require hundreds of hours of in-domain audio transcription for acceptable system performance. I want to lower this barrier to entry to more cheaply and quickly deploy speech recognition systems.

I like to think I'm a good empiricist, designing the right question whose answer will uncover the underlying cause.

I'm originally from Tacoma, Washington and hope to move back one day, but Baltimore has been my home since 2003.

  • PhD Computer Science, Johns Hopkins University (2009 - Now)
  • MsE Computer Science, Johns Hopkins University (2006-2008)
  • BA Mathematics, Johns Hopkins University (2003-2006)
  Research Interests:
Speech Recognition, Semi-Supervised Machine Learning, Extracting useful information from errorful output

Semi-supervised methods like self-training require robust models. If we are to bootstrap speech systems from small amounts of training data, both the acoustic and language model need to reliably improve from errorful automatically labeled data. While acoustic models are incredibly robust and work well with self-training, current LVCSR language models are the exact opposite.

I am interested in creating robust language models which can improve with large amounts of errorful automatically labeled words. This will require a rethinking of the simple n-gram approach and also to no longer trust words as truth, but evidence. In-domain speech transcriptions are most definitely not 'free' while huge quantities of audio are.

In addition to my main thesis topic, I am also curious about how to work with orders of magnitude more speech data, named entity extraction, using Mechanical Turk to cheaply create training data and information extraction from errorful data. In a previous life I studied pure math, but gave up after being scooped by Euler over 300 years ago.

  Publications and Presentations:


  • Semi-Supervised Methods for Improving Keyword Search of Unseen Terms
    Scott Novotney, Ivan Bulyko, Rich Schwartz, Sanjeev Khudanpur, Owen Kimball (2012)
    Interspeech 2012, Portland, Oregon
    [ pdf
  • Unsupervised Arabic Dialect Adaptation with Self-Training
    Scott Novotney, Rich Schwartz, Sanjeev Khudanpur (2011)
    Interspeech 2011, Florence, Italy
    [ pdf
  • Crowdsourced Accessibility: Elicitation of Wikipedia Articles
    Scott Novotney, Chris Callison-Burch (2010)
    Mechanical Turk Workshop, Los Angeles, USA
    [ pdf | slides
  • Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription
    Scott Novotney, Chris Callison-Burch (2010)
    Proceedings of NAACL, Los Angeles, USA
    [ pdf | poster
  • Analysis of Low-Resource Acoustic Model Self-Training
    Scott Novotney, Richard Schwartz (2009)
    In Proceedings of Interspeech, Brighton, England
    [ pdf | poster
  • Unsupervised acoustic and language model training with small amounts of labelled data
    Scott Novotney, Richard Schwartz, Jeff Ma (2009)
    In Proceedings of ICASSP, Taipai, Taiwan
    [ pdf | poster

Invited Talks

  • Factors Affecting ASR Model Self-Training, University of Cambridge, MIL Speech Seminar, Sept. 1st, 2009. [ slides ]
  • What to do with 10,000 hours of speech?, PIRE Meeting, Uppsala, Sweden, July 15, 2010 [ slides ]
  Work Experience: 
  • Staff Scientist, BBN Technologies (2008 - 2009) I worked at the JHU Human Language Technology Center of Excellence on low-resource speech recognition.
  • Summer Intern, BBN Technologies (Summer 2007) I spent the summer in Cambridge, MA working on named entity extraction from converational speech.
  • Summer Intern, BBN Technologies (Summer 2006) This was my first experience with NLP, creating named entity training data, designing annotation guidelines and running experiments.

  Other Interests:

I've stood in North Korea, sidstepped a British cow next to Hadrian's Wall and hid from the sun under a Roman aqueduct in Segovia, Spain. In other words, I really like travelling. If I ever meet you at a conference, I'm always up for an excursion.

I'm also a big fan of board games, particularly Diplomacy, the only game to ruin friendships. I'm an Eagle Scout and have a WYPR coffee mug.

The Center for Language and Speech Processing
The Johns Hopkins University
CSEB 225
3400 North Charles Street
Baltimore, MD 21218
Scott Novotney -- JHU Center for Language and Speech Processing