This book presents and develops several important concepts of speech enhancement in a simple but rigorous way. Many of the ideas are new; not only do they shed light on this old problem but they also offer valuable tips on how to improve on some well-known conventional approaches. The book unifies all aspects of speech enhancement, from single channel, multichannel, beamforming, time domain, frequency domain and time-frequency domain, to binaural in a clear and flexible framework. It starts with an exhaustive discussion on the fundamental best (linear and nonlinear) estimators, showing how they are connected to various important measures such as the coefficient of determination, the correlation coefficient, the conditional correlation coefficient, and the signal-to-noise ratio (SNR). It then goes on to show how to exploit these measures in order to derive all kinds of noise reduction algorithms that can offer an accurate and versatile compromise between noise reduction and speech distortion.


This work addresses this problem in the short-time Fourier transform (STFT) domain. We divide the general problem into five basic categories depending on the number of microphones being used and whether the interframe or interband correlation is considered. The first category deals with the single-channel problem where STFT coefficients at different frames and frequency bands are assumed to be independent. In this case, the noise reduction filter in each frequency band is basically a real gain. Since a gain does not improve the signal-to-noise ratio (SNR) for any given subband and frame, the noise reduction is basically achieved by liftering the subbands and frames that are less noisy while weighing down on those that are more noisy. The second category also concerns the single-channel problem. The difference is that now the interframe correlation is taken into account and a filter is applied in each subband instead of just a gain.
The advantage of using the interframe correlation is that we can improve not only the long-time fullband SNR, but the frame-wise subband SNR as well. The third and fourth classes discuss the problem of multichannel noise reduction in the STFT domain with and without interframe correlation, respectively. In the last category, we consider the interband correlation in the design of the noise reduction filters. We illustrate the basic principle for the single-channel case as an example, while this concept can be generalized to other scenarios. In all categories, we propose different optimization cost functions from which we derive the optimal filters and we also define the performance measures that help analyzing them.

This book focuses on the application of canonical correlation analysis (CCA) to speech enhancement using the filtering approach. The authors explain how to derive different classes of time-domain and time-frequency-domain noise reduction filters, which are optimal from the CCA perspective for both single-channel and multichannel speech enhancement. Enhancement of noisy speech has been a challenging problem for many researchers over the past few decades and remains an active research area. Typically, speech enhancement algorithms operate in the short-time Fourier transform (STFT) domain, where the clean speech spectral coefficients are estimated using a multiplicative gain function. A filtering approach, which can be performed in the time domain or in the subband domain, obtains an estimate of the clean speech sample at every time instant or time-frequency bin by applying a filtering vector to the noisy speech vector.

Compared to the multiplicative gain approach, the filtering approach more naturally takes into account the correlation of the speech signal in adjacent time frames. In this study, the authors pursue the filtering approach and show how to apply CCA to the speech enhancement problem. They also address the problem of adaptive beamforming from the CCA perspective, and show that the well-known Wiener and minimum variance distortionless response (MVDR) beamformers are particular cases of a general class of CCA-based adaptive beamformers.


This book provides a systematic study of the fundamental theory and methods of beamforming with differential microphone arrays (DMAs), or differential beamforming in short. It begins with a brief overview of differential beamforming and some popularly used DMA beampatterns such as the dipole, cardioid, hypercardioid, and supercardioid, before providing essential background knowledge on orthogonal functions and orthogonal polynomials, which form the basis of differential beamforming.

From a physical perspective, a DMA of a given order is defined as an array that measures the differential acoustic pressure field of that order; such an array has a beampattern in the form of a polynomial whose degree is equal to the DMA order. Therefore, the fundamental and core problem of differential beamforming boils down to the design of beampatterns with orthogonal polynomials. But certain constraints also have to be considered so that the resulting beamformer does not seriously amplify the sensors' self noise and the mismatches among sensors.

Accordingly, the book subsequently revisits several performance criteria, which can be used to evaluate the performance of the derived differential beamformers. Next, differential beamforming is placed in a framework of optimization and linear system solving, and it is shown how different beampatterns can be designed with the help of this optimization framework. The book then presents several approaches to the design of differential beamformers with the maximum DMA order, with the control of the white noise gain, and with the control of both the frequency invariance of the beampattern and the white noise gain. Lastly, it elucidates a joint optimization method that can be used to derive differential beamformers that not only deliver nearly frequency-invariant beampatterns, but are also robust to sensors' self noise.


Though noise reduction and speech enhancement problems have been studied for at least five decades, advances in our understanding and the development of reliable algorithms are more important than ever, as they support the design of tailored solutions for clearly defined applications. In this work, the authors propose a conceptual framework that can be applied to the many different aspects of noise reduction, offering a uniform approach
to monaural and binaural noise reduction problems, in the time domain and in the frequency domain, and involving a single or multiple microphones. Moreover, the derivation of optimal filters is simplified, as are the performance measures used for their evaluation.