Past

Speech signal processing in multi-speaker environments: problems, modelings, and assessment

Abstract

Deep learning for speech enhancement has dramatically accelerated the process of the cocktail party problem, which is a major challenge yet to be solved for tracking, enhancing and recognizing each individual speaker when multiple speakers talk simultaneously in a noisy and reverberant environment. In this presentation, I will start from solutions to the keywords spotting problem in a multi-speaker environment. A hybrid of full-band and narrow-band modeling is then introduced to address speech enhancement problems including acoustic echo cancellation, noise suppression, dereverberation, and automatic gain control in both single-channel and multi-channel setups. MetricNet, a non-intrusive speech quality assessment model, is developed for speech enhancement evaluation in real scenarios.


About the Speaker

Meng Yu received his B.S. in computational mathematics from Peking University in 2007 and Ph.D. in applied mathematics from the University of California, Irvine in 2012. Currently, he is a principal research scientist at Tencent AI Lab, working on far field frontend speech processing, deep learning based speech enhancement and separation, and their joint optimization with keyword spotting, speaker verification and acoustic model of speech recognition.

Baidu
sogou