Communication Sciences and Disorders Publications


The accuracy of envelope following responses in predicting speech audibility

Document Type


Publication Date



Ear and Hearing

First Page


Last Page


URL with Digital Object Identifier



Objectives: The present study aimed to (1) evaluate the accuracy of envelope following responses (EFRs) in predicting speech audibility as a function of the statistical indicator used for objective response detection, stimulus phoneme, frequency, and level, and (2) quantify the minimum sensation level (SL; stimulus level above behavioral threshold) needed for detecting EFRs. Design: In 21 participants with normal hearing, EFRs were elicited by 8 band-limited phonemes in the male-spoken token /susa∫i/ (2.05 sec) presented between 20 and 65 dB SPL in 15 dB increments. Vowels in /susa∫i/ were modified to elicit two EFRs simultaneously by selectively lowering the fundamental frequency (f0) in the first formant (F1) region. The modified vowels elicited one EFR from the low-frequency F1 and another from the mid-frequency second and higher formants (F2+). Fricatives were amplitude-modulated at the average f0. EFRs were extracted from single-channel EEG recorded between the vertex (Cz) and the nape of the neck when /susa∫i/ was presented monaurally for 450 sweeps. The performance of the three statistical indicators, F-test, Hotelling's T2, and phase coherence, was compared against behaviorally determined audibility (estimated SL, SL ≥0 dB = audible) using area under the receiver operating characteristics (AUROC) curve, sensitivity (the proportion of audible speech with a detectable EFR [true positive rate]), and specificity (the proportion of inaudible speech with an undetectable EFR [true negative rate]). The influence of stimulus phoneme, frequency, and level on the accuracy of EFRs in predicting speech audibility was assessed by comparing sensitivity, specificity, positive predictive value (PPV; the proportion of detected EFRs elicited by audible stimuli) and negative predictive value (NPV; the proportion of undetected EFRs elicited by inaudible stimuli). The minimum SL needed for detection was evaluated using a linear mixed-effects model with the predictor variables stimulus and EFR detection p value. Results: of the 3 statistical indicators were similar; however, at the type I error rate of 5%, the sensitivities of Hotelling's T2(68.4%) and phase coherence (68.8%) were significantly higher than the F-test (59.5%). In contrast, the specificity of the F-test (97.3%) was significantly higher than the Hotelling's T2(88.4%). When analyzed using Hotelling's T2as a function of stimulus, fricatives offered higher sensitivity (88.6 to 90.6%) and NPV (57.9 to 76.0%) compared with most vowel stimuli (51.9 to 71.4% and 11.6 to 51.3%, respectively). When analyzed as a function of frequency band (F1, F2+, and fricatives aggregated as low-, mid- and high-frequencies, respectively), high-frequency stimuli offered the highest sensitivity (96.9%) and NPV (88.9%). When analyzed as a function of test level, sensitivity improved with increases in stimulus level (99.4% at 65 dB SPL). The minimum SL for EFR detection ranged between 13.4 and 21.7 dB for F1 stimuli, 7.8 to 12.2 dB for F2+ stimuli, and 2.3 to 3.9 dB for fricative stimuli. Conclusions: EFR-based inference of speech audibility requires consideration of the statistical indicator used, phoneme, stimulus frequency, and stimulus level.