sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics

Research output: Journal article publicationJournal articleAcademic researchpeer-review

30 Citations (Scopus)

Abstract

Topic modeling methods such as latent Dirichlet allocation (LDA) are powerful tools for analyzing massive amounts of textual data. They have been used extensively in information systems (IS) and business discipline research to identify latent topics for data exploration and as a feature engineering mechanism to derive new variables for analyses. However, existing topic modeling approaches are mostly unsupervised and only leverage textual data, while ignoring additional useful metadata often associated with text, such as star ratings in customer reviews or categories of posts in online forums. As a result, the identified topics and variables derived based on the learned topic model may not be accurate, which could lead to incorrect estimations that affect subsequent empirical analysis and to inferior performance on predictive tasks. In this study, we propose a novel supervised deep topic modeling approach called sDTM, which combines a neural variational autoencoder model and a recurrent neural network. sDTM leverages the auxiliary data associated with text to enhance the topic modeling capability. We conduct empirical case studies and predictive analytics on an online consumer review data set and an online knowledge community data set. Experimental results show that in comparison with benchmark methods, sDTM can enhance both the empirical estimation and predictive performance. sDTM makes methodological contributions to the IS literature and has direct relevance for research using text analytics.

Original languageEnglish
Pages (from-to)137-156
Number of pages20
JournalInformation Systems Research
Volume34
Issue number1
Early online date22 Mar 2022
DOIs
Publication statusPublished - Mar 2023

Keywords

  • Bayesian variational inference
  • deep learning
  • supervised topic modeling
  • text analysis

ASJC Scopus subject areas

  • Management Information Systems
  • Information Systems
  • Computer Networks and Communications
  • Information Systems and Management
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics'. Together they form a unique fingerprint.

Cite this