Distributional similarity model for multi-modality clustering in social media

Donahue C.M. Sze, Tak Chung Fu, Fu Lai Korris Chung, Wing Pong Robert Luk

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

5 Citations (Scopus)

Abstract

User Generated Content (UGC) has become the fastest growing sector of the WWW. Data mining from UGC presents challenges not typically found in text mining from documents. UGC can be semi-structured and its content can be very short and informal, containing relatively little content similar to a chat or an email conversation. In addition, UGC can be viewed as a multi-modality data. These characteristics pose big challenges and research questions for scholars to cope with. To cluster UGC data, we can construct multiple contingency tables of modalities and employ the multi-way distributional clustering (MDC) algorithm. However, by considering a contingency table which summarizes the co-occurrence statistics of two modalities, it is not robust to represent the information entropy between two modalities in UGC data. In this paper, we propose a novel similarity measurement, called Distributional Similarity Model (DSM), to solidify the graph model in the MDC algorithm to deal with the unique characteristics of the UGC data.
Original languageEnglish
Title of host publicationProceedings - 2007 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2007
Pages268-271
Number of pages4
DOIs
Publication statusPublished - 1 Dec 2007
Event2007 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2007 - Silicon Valley, CA, United States
Duration: 2 Nov 20075 Nov 2007

Conference

Conference2007 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2007
Country/TerritoryUnited States
CitySilicon Valley, CA
Period2/11/075/11/07

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Distributional similarity model for multi-modality clustering in social media'. Together they form a unique fingerprint.

Cite this