Date of Award
2008
Degree Type
Thesis
Degree Name
Master of Engineering Science
Program
Electrical and Computer Engineering
Supervisor
Dr. Vijay Parsa
Abstract
It remains important to have accurate and reliable ways of measuring voice quality in speech communications systems and in abnormal voice assessment and rehabilitation. It is also beneficial to have objective measures of speech quality as opposed to subjective, in order to save time, money and other such resources. Objective measures of speech quality are typically divided into two groups: “intrusive” and “non-intrusive” measures. Intrusive measures require the knowledge of both the input speech along with the output of the system under test. Non-intrusive measures on the other hand only require access to the output speech signal of the system. This thesis examines methods of objective speech quality prediction using both intrusive and non-intrusive techniques for the analysis of tracheoesophageal speech. For the non-intrusive measurement, we first investigated the traditional acoustical measures for the analysis of tracheoesophageal speech. These included local and global assessments of voice perturbations along with glottal noise measures and features derived from linear predictive coding. In addition, we have applied timefrequency decomposition techniques and extracted a number of features for quantifying the speech quality. Discrete wavelet, wavelet packet and matching pursuit analysis were performed. Results from two experimental tracheoesophageal speech databases revealed a modest correlation of 0.69 between the parameters extracted from the time-frequency analysis and the subjective ratings. The results obtained however modest, are far superior to those achieved using typical acoustic measures. The intrusive measure was computed using the Moore-Glasberg auditory model, from which the objective measures were extracted based on the loudness pattern distortions. Several distance metrics were calculated from the differences between a high quality tracheoesophageal speaker and a corresponding test signal in the perceptual space. Statistical combination of metrics based on the loudness pattern distortions provided a correlation value of 0.79 with the subjective results. This compared favorably to the state-of-the-art intrusive ITU-T P.862.1 objective standard which had a correlation of 0.56 using our database of tracheoesophageal speakers.
Recommended Citation
McDonald, Rob, "Objective Evaluation of Tracheoesophageal Speech Quality" (2008). Digitized Theses. 4816.
https://ir.lib.uwo.ca/digitizedtheses/4816