Towards scalable emotion classification in microblog based on noisy training data

Minglei Li, Qin Lu, Lin Gui, Yunfei Long

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

The availability of labeled corpus is of great importance for emotion classification tasks. Because manual labeling is too timeconsuming, hashtags have been used as naturally annotated labels to obtain large amount of labeled training data from microblog. However, the inconsistency and noise in annotation can adversely affect the data quality and thus the performance when used to train a classifier. In this paper, we propose a classification framework which allows naturally annotated data to be used as additional training data and employs a k-NN graph based data cleaning method to remove noise after noisy data has certain accumulations. Evaluation on NLP&CC2013 Chinese Weibo emotion classification dataset shows that our approach achieves 15.8% better performance than directly using the noisy data without noise filtering. After adding the filtered data with hashtags into an existing high-quality training data, the performance increases 3.7% compared to using the high-quality training data alone.
Original languageEnglish
Title of host publicationChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 15th China National Conference, CCL 2016 and 4th International Symposium, NLP-NABD 2016, Proceedings
PublisherSpringer Verlag
Pages399-410
Number of pages12
ISBN (Print)9783319476735
DOIs
Publication statusPublished - 1 Jan 2016
Event15th China National Conference on Chinese Computational Linguistics, CCL 2016 and 4th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2016 - Yantai, China
Duration: 15 Oct 201616 Oct 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10035 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th China National Conference on Chinese Computational Linguistics, CCL 2016 and 4th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2016
Country/TerritoryChina
CityYantai
Period15/10/1616/10/16

Keywords

  • Data cleaning
  • Emotion classification
  • Hashtag
  • K-NN

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this