Active learning for cross-lingual sentiment classification

Shoushan Li, Rong Wang, Huanhuan Liu, Chu-ren Huang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

7 Citations (Scopus)

Abstract

Cross-lingual sentiment classification aims to predict the sentiment orientation of a text in a language (named as the target language) with the help of the resources from another language (named as the source language). However, current cross-lingual performance is normally far away from satisfaction due to the huge difference in linguistic expression and social culture. In this paper, we suggest to perform active learning for cross-lingual sentiment classification, where only a small scale of samples are actively selected and manually annotated to achieve reasonable performance in a short time for the target language. The challenge therein is that there are normally much more labeled samples in the source language than those in the target language. This makes the small amount of labeled samples from the target language flooded in the aboundance of labeled samples from the source language, which largely reduces their impact on cross-lingual sentiment classification. To address this issue, we propose a data quality controlling approach in the source language to select high-quality samples from the source language. Specifically, we propose two kinds of data quality measurements, intra- and extra-quality measurements, from the certainty and similarity perspectives. Empirical studies verify the appropriateness of our active learning approach to cross-lingual sentiment classification.
Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - Second CCF Conference, NLPCC 2013, Proceedings
PublisherSpringer Verlag
Pages236-246
Number of pages11
ISBN (Print)9783642416439
DOIs
Publication statusPublished - 1 Jan 2013
Event2nd CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2013 - Chongqing, China
Duration: 15 Nov 201319 Nov 2013

Publication series

NameCommunications in Computer and Information Science
Volume400
ISSN (Print)1865-0929

Conference

Conference2nd CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2013
CountryChina
CityChongqing
Period15/11/1319/11/13

ASJC Scopus subject areas

  • Computer Science(all)

Cite this