A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach

M. Ghiassi, S. Lee

Research output: Journal article publicationJournal articleAcademic researchpeer-review

129 Citations (Scopus)

Abstract

The Twitter messaging service has become a platform for customers and news consumers to express sentiments. Accurately capturing these sentiments has been challenging for researchers. The traditional approaches to Twitter Sentiment Analysis (TSA) include dictionary-based and use supervised machine learning tools for sentiment classification. This research follows the supervised machine learning approach. A major challenge for the machine learning approach is feature selection, which is often domain dependent. We address this specific challenge and present a novel approach to identify a lexicon set unique to TSA. We show that this Twitter Specific Lexicon Set (TSLS) is small, and most importantly, is domain transferable. This identification process generates a collection of vectorized tweets for input to machine learning tools. In traditional approaches, this vectorization often results in a highly sparse input matrix which produces low accuracy measures. In this research, we hierarchically reduce the feature set to a small set of seven “meta features” to reduce sparsity. We show that TSA based on these features can produce highly accurate results using a dynamic architecture for neural networks (DAN2) and SVM (machine learning tools) as measured by recall, precision, and F1 metrics (the harmonic average of precision and recall). Our results show that a Twitter Generic Feature Set (TGFS) derived from two datasets (@JustinBieber and @Starbucks) is domain transferable and when combined with only a few Twitter Domain Specific Features (TDSF) (less than 3%), can produce excellent sentiment classification values. We evaluate the effectiveness and transferability of the TGFS across three new and distinct domains (@GovChristie, @SouthwestAir, and @VerizonWireless).

Original languageEnglish
Pages (from-to)197-216
Number of pages20
JournalExpert Systems with Applications
Volume106
DOIs
Publication statusPublished - 15 Sept 2018
Externally publishedYes

Keywords

  • Domain transferability
  • Dynamic artificial neural networks (DAN2)
  • Machine learning
  • n-gram analysis
  • Twitter sentiment analysis

ASJC Scopus subject areas

  • General Engineering
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach'. Together they form a unique fingerprint.

Cite this