JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY
URL with Digital Object Identifier
This paper furthers the development of methods to dis- tinguish truth from deception in textual data. We use rhetorical structure theory (RST) as the analytic framework to identify systematic differences between deceptive and truthful stories in terms of their coher- ence and structure. A sample of 36 elicited personal stories, self-ranked as truthful or deceptive, is manu- ally analyzed by assigning RST discourse relations among each story’s constituent parts. A vector space model (VSM) assesses each story’s position in multi- dimensional RST space with respect to its distance from truthful and deceptive centers as measures of the story’s level of deception and truthfulness. Ten human judges evaluate independently whether each story is deceptive and assign their confidence levels (360 evaluations total), producing measures of the expected human ability to recognize deception. As a robustness check, a test sample of 18 truthful stories (with 180 additional evaluations) is used to determine the reli- ability of our RST-VSM method in determining decep- tion. The contribution is in demonstration of the discourse structure analysis as a significant method for automated deception detection and an effective complement to lexicosemantic analysis. The potential is in developing novel discourse-based tools to alert information users to potential deception in computer- mediated texts.