The chapter is focused on blogosphere research based on the TREC blog distillation task, and aims to explore unbiased and significant features automatically and efficiently. Feedback from faceted feeds is introduced to harvest relevant features and information gain is used to select discriminative features, including the unigrams as well as the patterns of unigram associations. Meanwhile facing the terabyte blog dataset, some flexible processing is adopted in our approach. The evaluation result shows that the selected feedback features can greatly improve the performance and adapt well to the terabyte data.
|Title of host publication||Social Media Content Analysis|
|Subtitle of host publication||Natural Language Processing and Beyond|
|Publisher||World Scientific Publishing Co. Pte. Ltd.|
|Number of pages||10|
|Publication status||Published - 1 Jan 2017|
ASJC Scopus subject areas
- Computer Science(all)