Extracting Chinese product features: Representing a sequence by a set of skip-bigrams

Ge Xu, Chu-ren Huang, Houfeng Wang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

5 Citations (Scopus)

Abstract

A skip-bigram is a bigram that allows skips between words. In this paper, we use a set of skip bigrams (a SBGSet) to represent a short word sequence, which is the typical form of a product feature. The advantage of SBGSet representation for word sequences is that we can convert between a sequence and a set. Under the SBGSet representation we can employ association rule mining to find frequent itemsets from which frequent product features can be extracted.For infrequent product features, we use a pattern-based method to extract them. A pattern is also represented by a SBGSet, and contains a variable that can be instantiated to a product feature.We use two data sets to evaluate our method. The experimental result shows that our method is suitable for extracting Chinese product features, and the pattern-based method to extract infrequent product features is effective.
Original languageEnglish
Title of host publicationChinese Lexical Semantics - 13th Workshop, CLSW 2012, Revised Selected Papers
Pages72-83
Number of pages12
DOIs
Publication statusPublished - 26 Feb 2013
Event13th Chinese Lexical Semantics Workshop, CLSW 2012 - Wuhan, China
Duration: 6 Jul 20128 Jul 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7717 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th Chinese Lexical Semantics Workshop, CLSW 2012
CountryChina
CityWuhan
Period6/07/128/07/12

Keywords

  • product feature
  • sentiment analysis
  • skip-bigram
  • word sequence

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this