Author

Rob McDonald

Date of Award

2008

Degree Type

Thesis

Degree Name

Master of Engineering Science

Program

Electrical and Computer Engineering

Supervisor

Dr. Vijay Parsa

Abstract

It remains important to have accurate and reliable ways of measuring voice quality in speech communications systems and in abnormal voice assessment and rehabilitation. It is also beneficial to have objective measures of speech quality as opposed to subjective, in order to save time, money and other such resources. Objective measures of speech quality are typically divided into two groups: “intrusive” and “non-intrusive” measures. Intrusive measures require the knowledge of both the input speech along with the output of the system under test. Non-intrusive measures on the other hand only require access to the output speech signal of the system. This thesis examines methods of objective speech quality prediction using both intrusive and non-intrusive techniques for the analysis of tracheoesophageal speech. For the non-intrusive measurement, we first investigated the traditional acoustical measures for the analysis of tracheoesophageal speech. These included local and global assessments of voice perturbations along with glottal noise measures and features derived from linear predictive coding. In addition, we have applied timefrequency decomposition techniques and extracted a number of features for quantifying the speech quality. Discrete wavelet, wavelet packet and matching pursuit analysis were performed. Results from two experimental tracheoesophageal speech databases revealed a modest correlation of 0.69 between the parameters extracted from the time-frequency analysis and the subjective ratings. The results obtained however modest, are far superior to those achieved using typical acoustic measures. The intrusive measure was computed using the Moore-Glasberg auditory model, from which the objective measures were extracted based on the loudness pattern distortions. Several distance metrics were calculated from the differences between a high quality tracheoesophageal speaker and a corresponding test signal in the perceptual space. Statistical combination of metrics based on the loudness pattern distortions provided a correlation value of 0.79 with the subjective results. This compared favorably to the state-of-the-art intrusive ITU-T P.862.1 objective standard which had a correlation of 0.56 using our database of tracheoesophageal speakers.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.