Document Type


Publication Date








URL with Digital Object Identifier



Computational semantics, a branch of computational linguistics, involves automated meaning analysis that relies on how words occur together in natural language. This offers a promising tool to study schizophrenia. At present, we do not know if these word-level choices in speech are sensitive to the illness stage (i.e., acute untreated vs. stable established state), track cognitive deficits in major domains (e.g., cognitive control, processing speed) or relate to established dimensions of formal thought disorder. In this study, we collected samples of descriptive discourse in patients experiencing an untreated first episode of schizophrenia and healthy control subjects (246 samples of 1-minute speech; n = 82, FES = 46, HC = 36) and used a co-occurrence based vector embedding of words to quantify semantic similarity in speech. We obtained six-month follow-up data in a subsample (99 speech samples, n = 33, FES = 20, HC = 13). At baseline, semantic similarity was evidently higher in patients compared to healthy individuals, especially when social functioning was impaired; but this was not related to the severity of clinically ascertained thought disorder in patients. Across the study sample, higher semantic similarity at baseline was related to poorer Stroop performance and processing speed. Over time, while semantic similarity was stable in healthy subjects, it increased in patients, especially when they had an increasing burden of negative symptoms. Disruptions in word-level choices made by patients with schizophrenia during short 1-min descriptions are sensitive to interindividual differences in cognitive and social functioning at first presentation and persist over the early course of the illness.