Sharon Gannot (Bar-Ilan University, Israel) “Speech Enhancement Using a Deep Mixture of Experts”

When:
October 20, 2017 @ 12:00 pm – 1:15 pm
2017-10-20T12:00:00-04:00
2017-10-20T13:15:00-04:00
Where:
Hackerman Hall B17
3400 N Charles St
Baltimore, MD 21218
USA
Cost:
Free
Contact:
Center for Language and Speech Processing

Abstract

In this study, we present a deep mixture of experts (DMoE) neural-network architecture for single microphone speech enhancement. By contrast to many speech enhancement algorithms that overlook the spectral variability of the speech signal, our framework comprises a set of deep neural networks (DNNs), each of which is an ‘expert’ in enhancing a different spectral pattern of the speech signal. Under this framework, a gating DNN determines the weights assigned to each expert given a speech segment. A speech presence probability (SPP) is then obtained as a weighted average of the experts’ SPP decisions, with the weights determined by the gating DNN. A soft spectral attenuation, based on the SPP, is then applied to enhance the noisy speech signal.

We start our presentation by discussing a supervised scheme, in which the ‘expertize’ of each expert is a specific phoneme. This scheme necessitates a phoneme-labeled database (e.g. TIMIT). In this scheme, the experts are denoted phoneme-specific DNNs (pDNNs) and the gating DNN are denoted phoneme-classification DNN (cDNN). We propose a compound training procedure, where each pDNN is first pre-trained using the phoneme labeling and the cDNN is trained to classify phonemes. Since these labels are unavailable in the test phase, the entire network is then trained using the noisy utterance, with the cDNN providing phoneme classification.

In the more general, unsupervised scheme, the experts and the gating components of the DMoE network are jointly trained. As part of the training, speech clustering to different subsets (spectral patterns) is carried out in an unsupervised manner. Therefore, unlike the supervised scheme, a phoneme-labeled dataset is not required for the training procedure.

A series of experiments with various noise types and speech databases verifies the applicability of the new algorithm to the task of speech enhancement. We have found that the experts’ specialization allows better robustness to unfamiliar noise types. The proposed schemes outperform other schemes that either do not consider speech spectral patterns or use a simpler training methodology. They also significantly outperform classical model-based methods in both speech quality and speech intelligibility measures.

Biography

Sharon Gannot received his B.Sc. degree (summa cum laude) from the Technion Israel Institute of Technology, Haifa, Israel in 1986 and the M.Sc. (cum laude) and Ph.D. degrees from Tel-Aviv University, Israel in 1995 and 2000 respectively, all in Electrical Engineering. In 2001, he held a post-doctoral position at the department of Electrical Engineering (ESAT-SISTA) at K.U.Leuven, Belgium. From 2002 to 2003, he held a research and teaching position at the Faculty of Electrical Engineering, Technion-Israel Institute of Technology, Haifa, Israel. Currently, he is a Full Professor at the Faculty of Engineering, Bar-Ilan University, Israel, where he is heading the Speech and Signal Processing laboratory and the Signal Processing Track. Prof. Gannot is the recipient of Bar-Ilan University outstanding lecturer award for 2010 and 2014. He is also a co-recipient of seven best paper awards. Prof. Gannot has served as an Associate Editor of the EURASIP Journal of Advances in Signal Processing in 2003-2012, and as an Editor of several special issues on Multi-microphone Speech Processing of the same journal. He has also served as a guest editor of ELSEVIER Speech Communication and Signal Processing journals. Prof. Gannot has served as an Associate Editor of IEEE Transactions on Speech, Audio and Language Processing in 2009-2013, and as a Senior Area Chair of the same journal in 2013-2017. He also serves as a reviewer of many IEEE journals and conferences. Prof. Gannot is a member of the Audio and Acoustic Signal Processing (AASP) technical committee of the IEEE since Jan. 2010. Since Jan. 2017, he serves as the committee chair. He is also a member of the Technical and Steering committee of the International Workshop on Acoustic Signal Enhancement (IWAENC) since 2005 and was the general co-chair of IWAENC held at Tel-Aviv, Israel in August 2010. Prof. Gannot has served as the general co-chair of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) in October 2013. Prof. Gannot was selected (with colleagues) to present a tutorial sessions in ICASSP 2012, EUSIPCO 2012, ICASSP 2013 and EUSIPCO 2013. Prof. Gannot research interests include multi-microphone speech processing and specifically distributed algorithms for ad hoc microphone arrays for noise reduction and speaker separation; dereverberation; single microphone speech enhancement using learning methods, and speaker localization and tracking using manifold learning.

Johns Hopkins University

Johns Hopkins University, Whiting School of Engineering

Center for Language and Speech Processing
Hackerman 226
3400 North Charles Street, Baltimore, MD 21218-2680

Center for Language and Speech Processing