Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article

Degree

Doctor of Philosophy

Program

Neuroscience

Supervisor

Ingrid Johnsrude

Abstract

This dissertation explores mechanisms for understanding and improving speech processing. First, I used EEG to investigate the acoustic and semantic processing of continuous naturalistic speech masked by multi-talker babble. I found that different features of the same speech signal are reflected in different aspects of the neural tracking response, which are themselves differentially affected by noise. These findings point to a complex relationship between speech intelligibility and neural speech encoding.

Next, I systematically reviewed the current advancements in speech enhancement technologies. I find that speech enhancement algorithms are limited in their generalizability to speech-noise (i.e., babble). I demonstrate that, for the few studies that do consider babble in their algorithm training, the employed babble sounds reflect extremely high talker-densities, with little variability between babble exemplars. These babble samples do not exhaustively capture the complexities of the natural acoustic environments, hindering the generalizability of speech enhancement algorithms to real-word settings.

Finally, I explored novel approaches to speech denoising, including one that incorporates the principles of redundancy reduction. Additionally, I leveraged complex and varying noise types (i.e., babble with variable spectral density) to address the complexity of auditory processing as well as the limited generalizability of other speech enhancement algorithms. A comparison between the redundancy reduction model and a standard denoising model, trained with the same data, demonstrated that the redundancy reduction model performs worse. Nevertheless, both models were able to significantly enhance speech masked by high-variability, low-density babble, an important noise type which had not been explored previously. This study presents the first application, to my knowledge, that considers low-talker density babble (i.e., few number of talkers) sounds for speech masking, which is a complex task for speech enhancement algorithms.

This dissertation advanced our understanding of acoustic and semantic speech processing, specifically their complex relationships to masker level. It also identified critical gaps in existing speech denoising methodologies that have important roles in generalizability of speech denoising. Finally, this dissertation introduced novel approaches to speech enhancement that are capable of attenuating challenging and realistic noise types that have previously been neglected in speech enhancement work.

Summary for Lay Audience

This dissertation seeks to improve our understanding of speech processing in noisy backgrounds, as well as develop technologies to denoise speech signals. I examined brain responses to continuous speech that was masked by babble noise and found that different features of the same speech signal are affected differently by noise. Namely, tracking of semantic features (word meanings) are more resilient to increases in noise level compared to tracking of acoustic features (e.g., pitch).

Next, I reviewed the advancements in speech enhancement (denoising) technologies. I found that these speech enhancement algorithms often fail to use babble in their algorithm training. When they do, the employed babble sounds are made up of large numbers of talkers and are very similar between babble exemplars. These babble samples do not reflect real world background noises, hindering the generalizability of speech enhancement algorithms to real-word settings.

Finally, I explored new methods speech denoising, including one that attempts to minimize redundancies in the input data. Additionally, I trained these speech enhancement models with varying noise types (i.e., babble with variable number of talkers) to improve the generalizability of these algorithms for real-world environments. We found that specialized models perform worse than standard denoising models, trained with the same data. However, both model types were able improve the quality of speech signals that were masked with realistic babble noise, that has not been explored previously.

This dissertation advanced our understanding of acoustic and semantic speech processing, and their complex relationships to noise level. It also identified the limited generalizability of existing speech denoising methods. Finally, this dissertation introduced new approaches to speech enhancement that are capable of denoising speech masked with challenging and realistic noise types that have previously been neglected in speech enhancement work.

Share

COinS