Skip to main navigation Skip to search Skip to main content

A Study of Discriminatory Speech Classification Based on Improved Smote and SVM-RF

  • Chao Wu
  • , Huijuan Hu
  • , Dingju Zhu (Corresponding Author)
  • , Xilin Shan
  • , Kai Leung Yung
  • , Andrew W. H. Ip

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

The rapid development of the Internet has facilitated expression, sharing, and interaction on social networks, but some speech may contain harmful discrimination. Therefore, it is crucial to classify such speech. In this paper, we collected discriminatory data from Sina Weibo and propose the improved Synthetic Minority Over-sampling Technique (SMOTE) algorithm based on Latent Dirichlet Allocation (LDA) to improve data quality and balance. And we propose a new integration method integrating Support Vector Machine (SVM) and Random Forest (RF). The experimental results demonstrate that the integrated model exhibits enhanced precision, recall, and F1 score by 6.0%, 5.4%, and 5.7%, respectively, in comparison with SVM alone. Moreover, it exhibits the best performance in comparison with other machine learning methods. Furthermore, the positive impact of improved SMOTE and this integrated method on model classification is also confirmed in ablation experiments
Original languageEnglish
Article number6468
Number of pages14
JournalApplied Sciences (Switzerland)
Volume14
Issue number15
DOIs
Publication statusPublished - 24 Jul 2024

Keywords

  • discrimination speech
  • latent Dirichlet allocation
  • support vector machine
  • random forest
  • integration method

Fingerprint

Dive into the research topics of 'A Study of Discriminatory Speech Classification Based on Improved Smote and SVM-RF'. Together they form a unique fingerprint.

Cite this