Empirical Study of Tweets Topic Classification Using Transformer-Based Language Models

Ranju Mandal, Jinyan Chen, Susanne Becken, Bela Stantic

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Social media opens up a great opportunity for policymakers to analyze and understand a large volume of online content for decision-making purposes. People’s opinions and experiences on social media platforms such as Twitter are extremely significant because of its volume, variety, and veracity. However, processing and retrieving useful information from natural language content is very challenging because of its ambiguity and complexity. Recent advances in Natural Language Understanding (NLU)-based techniques more specifically Transformer-based architecture solve sequence-to-sequence modeling tasks while handling long-range dependencies efficiently, and models based on transformers setting new benchmarks in performance across a wide variety of NLU-based tasks. In this paper, we applied transformer-based sequence modeling on short texts’ topic classification from tourist/user-posted tweets. Multiple BERT-like state-of-the-art sequence modeling approaches on topic/target classification tasks are investigated on the Great Barrier Reef tweet dataset and obtained findings can be valuable for researchers working on classification with large data sets and a large number of target classes.

Original languageEnglish
Title of host publicationIntelligent Information and Database Systems - 13th Asian Conference, ACIIDS 2021, Proceedings
EditorsNgoc Thanh Nguyen, Suphamit Chittayasothorn, Dusit Niyato, Bogdan Trawiński
PublisherSpringer Science and Business Media Deutschland GmbH
Pages340-350
Number of pages11
ISBN (Print)9783030732790
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event13th Asian Conference on Intelligent Information and Database Systems, ACIIDS 2021 - Phuket, Thailand
Duration: 7 Apr 202110 Apr 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12672 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th Asian Conference on Intelligent Information and Database Systems, ACIIDS 2021
Country/TerritoryThailand
CityPhuket
Period7/04/2110/04/21

Keywords

  • Deep learning
  • Natural language processing
  • Target classification
  • Topic classification
  • Transformer

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this