Western Papers in Linguistics / Cahiers linguistiques de Western

Article Title

The Language Identification Problem: Formant Analysis and Cross-Linguistic Uniqueness


In the field of computational linguistics, spoken language recognition (through the use of wordlists and morphological markers) is a resource-intensive process: the input must be parsed from the inputted speech signal, words must be hypothesized, and then subsequently word-lists for any likely language must be iterated through. To note, spoken language recognition does not refer to the process of identifying the meaning of the input; rather, it is finding the language of which the speaker is speaking (not necessarily 'parsing' the input). In my research, the question of whether a language can be positively and uniquely identified through small nuances found in the individual formants of vowels is examined.

Through analysis of language samples from the Heritage Language Variation and Change (HLVC) corpus (courtesy of Dr. N. Nagy (University of Toronto), pan-linguistic formant frequency distribution was examined. Tabulation of the first three formant frequencies was performed, and through analysis of formant distribution histograms, it is clear that all of the languages in question (Italian, Korean, and Ukrainian) show enough variation to be positively identified.

This document is now available on OJS