Deep, Long and Wide Artificial Neural Networks in Processing of Speech – Hynek Hermansky (Johns Hopkins University)

Calendar

When:

July 29, 2014 @ 2:00 pm – 3:00 pm

2014-07-29T14:00:00+00:00

2014-07-29T15:00:00+00:00

Where:

Czech Republic

Seminars

ws14 ws14clamr ws14mt ws14participant

View Seminar Video
View Presentation Slides
Abstract
Up to recently, automatic recognition of speech (ASR) proceeded in a single stream: from a speech signal, through a feature extraction module and pattern classifier into search for the best word sequence. Features were mostly hand-crafted based and represented relative short (10-20 ms) instantaneous snapshots of speech signal. Introduction of artificial neural nets (ANNs) into speech processing allowed for much more ambitious and more effective schemes. Today’s speech features for ASR are derived from large amounts of speech data, often using complex deep neural net architectures. The talk argues for ANNs that are not only deep but also wide (i.e., processing information in multiple parallel processing streams) and long (i.e., extracting information from speech segments much longer than 10-20 ms). Support comes from psychophysics and physiology of speech perception, as well as from speech data itself. The talk reviews history of gradual shift towards nonlinear multi-stream extraction of information from spectral dynamics of speech, and shows some advantages of this approach in ASR.

All Participant Lectures will be held in Room S1, 4th Floor.

Deep, Long and Wide Artificial Neural Networks in Processing of Speech – Hynek Hermansky (Johns Hopkins University)

Center for Language and Speech Processing