Electronic Thesis and Dissertation Repository

Towards Understanding and Improving Speech Processing

Sonia Yasmin

Abstract

This dissertation explores mechanisms for understanding and improving speech processing. First, I used EEG to investigate the acoustic and semantic processing of continuous naturalistic speech masked by multi-talker babble. I found that different features of the same speech signal are reflected in different aspects of the neural tracking response, which are themselves differentially affected by noise. These findings point to a complex relationship between speech intelligibility and neural speech encoding.

Next, I systematically reviewed the current advancements in speech enhancement technologies. I find that speech enhancement algorithms are limited in their generalizability to speech-noise (i.e., babble). I demonstrate that, for the few studies that do consider babble in their algorithm training, the employed babble sounds reflect extremely high talker-densities, with little variability between babble exemplars. These babble samples do not exhaustively capture the complexities of the natural acoustic environments, hindering the generalizability of speech enhancement algorithms to real-word settings.

Finally, I explored novel approaches to speech denoising, including one that incorporates the principles of redundancy reduction. Additionally, I leveraged complex and varying noise types (i.e., babble with variable spectral density) to address the complexity of auditory processing as well as the limited generalizability of other speech enhancement algorithms. A comparison between the redundancy reduction model and a standard denoising model, trained with the same data, demonstrated that the redundancy reduction model performs worse. Nevertheless, both models were able to significantly enhance speech masked by high-variability, low-density babble, an important noise type which had not been explored previously. This study presents the first application, to my knowledge, that considers low-talker density babble (i.e., few number of talkers) sounds for speech masking, which is a complex task for speech enhancement algorithms.

This dissertation advanced our understanding of acoustic and semantic speech processing, specifically their complex relationships to masker level. It also identified critical gaps in existing speech denoising methodologies that have important roles in generalizability of speech denoising. Finally, this dissertation introduced novel approaches to speech enhancement that are capable of attenuating challenging and realistic noise types that have previously been neglected in speech enhancement work.