Efficient feedback-based feature learning for blog distillation as a terabyte challenge

Dehong Gao, Wenjie Li, Renxian Zhang

Research output: Chapter in book / Conference proceedingChapter in an edited book (as author)Academic researchpeer-review

Abstract

The chapter is focused on blogosphere research based on the TREC blog distillation task, and aims to explore unbiased and significant features automatically and efficiently. Feedback from faceted feeds is introduced to harvest relevant features and information gain is used to select discriminative features, including the unigrams as well as the patterns of unigram associations. Meanwhile facing the terabyte blog dataset, some flexible processing is adopted in our approach. The evaluation result shows that the selected feedback features can greatly improve the performance and adapt well to the terabyte data.

Original languageEnglish
Title of host publicationSocial Media Content Analysis
Subtitle of host publicationNatural Language Processing and Beyond
PublisherWorld Scientific Publishing Co. Pte. Ltd.
Pages215-224
Number of pages10
ISBN (Electronic)9789813223615
ISBN (Print)9789813223608
DOIs
Publication statusPublished - 1 Jan 2017

ASJC Scopus subject areas

  • Computer Science(all)

Cite this