Data quality controlling for cross-lingual sentiment classification

Shoushan Li, Yunxia Xue, Zhongqing Wang, Yat Mei Lee, Chu-ren Huang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Cross-lingual sentiment classification aims to perform sentiment classification in a language (named as the target language) with the help of the resources from another language (named as the source language). Previous studies are prone to using all available data in the source language while using all data is observed to perform no better or even worse than using a partion of good data. In this paper, we propose a novel task called data quality controlling in the source language to select high quality samples from the source language. To tackle this task, we propose two kinds of data quality measurements: intra- and extra-quality measurements which are implemented with the certainty and similarity measurements respectively. The empirical studies demonstrate the effectiveness of the proposed approach to data quality controlling in the source language.
Original languageEnglish
Title of host publicationProceedings - 2013 International Conference on Asian Language Processing, IALP 2013
Pages125-128
Number of pages4
DOIs
Publication statusPublished - 1 Dec 2013
Event2013 International Conference on Asian Language Processing, IALP 2013 - Urumqi, Xinjiang, China
Duration: 17 Aug 201319 Aug 2013

Conference

Conference2013 International Conference on Asian Language Processing, IALP 2013
CountryChina
CityUrumqi, Xinjiang
Period17/08/1319/08/13

ASJC Scopus subject areas

  • Software

Cite this