Empirical Exploring Word-Character Relationship for Chinese Sentence Representation

Research output: Journal article publicationJournal articleAcademic researchpeer-review

10 Citations (Scopus)

Abstract

This article addresses the problem of learning compositional Chinese sentence representations, which represent the meaning of a sentence by composing the meanings of its constituent words. In contrast to English, a Chinese word is composed of characters,which contain rich semantic information. However, this information has not been fully exploited by existing methods. In this work, we introduce a novel, mixed character-word architecture to improve the Chinese sentence representations by utilizing rich semantic information of innerword characters.We propose two novel strategies to reach this purpose. The first one is to use a mask gate on characters, learning the relation among characters in a word. The second one is to use a max-pooling operation on words to adaptively find the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence compositionmodels, which achieves substantial performance gains over baseline models on sentence similarity task. To further verify the generalization ability of our model, we employ the learned sentence representations as features in sentence classification task, question classification task, and sentence entailment task. Results have shown that the proposed mixed character-word sentence representation models outperform both the character-based andword-based models.

Original languageEnglish
Article number14
JournalACM Transactions on Asian and Low-Resource Language Information Processing
Volume17
Issue number3
DOIs
Publication statusPublished - 31 Jan 2018
Externally publishedYes

Keywords

  • Compositionmodel
  • Inner-word character
  • Mask gate
  • Max pooling
  • Mixed character-word representation
  • Sentence representation

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Empirical Exploring Word-Character Relationship for Chinese Sentence Representation'. Together they form a unique fingerprint.

Cite this