Western Papers in Linguistics / Cahiers linguistiques de Western


In the field of computational linguistics, spoken language recognition (through the use of wordlists and morphological markers) is a resource-intensive process: the input must be parsed from the inputted speech signal, words must be hypothesized, and then subsequently word-lists for any likely language must be iterated through. To note, spoken language recognition does not refer to the process of identifying the meaning of the input; rather, it is finding the language of which the speaker is speaking (not necessarily 'parsing' the input). In my research, the question of whether a language can be positively and uniquely identified through small nuances found in the individual formants of vowels is examined.

Through analysis of language samples from the Heritage Language Variation and Change (HLVC) corpus (courtesy of Dr. N. Nagy (University of Toronto), pan-linguistic formant frequency distribution was examined. Tabulation of the first three formant frequencies was performed, and through analysis of formant distribution histograms, it is clear that all of the languages in question (Italian, Korean, and Ukrainian) show enough variation to be positively identified.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.