Component-enhanced Chinese character embeddings

Yanran Li, Wenjie Li, Fei Sun, Sujian Li

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

33 Citations (Scopus)

Abstract

Distributed word representations are very useful for capturing semantic information and have been successfully applied in a variety of NLP tasks, especially on English. In this work, we innovatively develop two component-enhanced Chinese character embedding models and their bigram extensions. Distinguished from English word embeddings, our models explore the compositions of Chinese characters, which often serve as semantic indictors inherently. The evaluations on both word similarity and text classification demonstrate the effectiveness of our models.
Original languageEnglish
Title of host publicationConference Proceedings - EMNLP 2015
Subtitle of host publicationConference on Empirical Methods in Natural Language Processing
PublisherAssociation for Computational Linguistics (ACL)
Pages829-834
Number of pages6
ISBN (Electronic)9781941643327
Publication statusPublished - 1 Jan 2015
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Lisbon, Portugal
Duration: 17 Sep 201521 Sep 2015

Conference

ConferenceConference on Empirical Methods in Natural Language Processing, EMNLP 2015
Country/TerritoryPortugal
CityLisbon
Period17/09/1521/09/15

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this