
mixSTM: Adapting the Structural Topic Model for a quantitative analysis of focus group data
Abstract
The Structural Topic Model (STM) incorporates external information about expected document-topic proportions to enhance the model. Motivated by focus groups, whose transcripts represent text data inherently grouped by session, we propose three extensions to the STM: 1) mean document-topic proportion estimation using a regression with random effects; 2) partitioned estimation of group-specific topic covariance matrices; and 3) a post hoc mixed effects regression on topic prevalence which incorporates latent variable uncertainty into the coefficient estimates. We explore the utility of these modifications through simulated examples and apply them to focus group transcripts from a pan-Canadian study on homelessness. The new methods, collectively the “mixSTM", improved topic model fit when there was complex group-related variation in topic prevalence and provided new avenues for interpretation. These methods may better represent analyst beliefs about qualities of grouped text data, although there is a risk of over-complicating the estimation given small, qualitative data sources.