A fuzzy-rough method for concept-based document expansion

Yan Li, Chi Keung Simon Shiu, Sankar Kumar Pal, James Nga Kwok Liu

Research output: Journal article publicationConference articleAcademic researchpeer-review

3 Citations (Scopus)

Abstract

In this paper, a novel approach of fuzzy-rough hybridization is developed for concept-based document expansion to enhance the quality of text information retrieval. Firstly, different from the traditional way of document representation, a given set of text documents is represented by an incomplete information system. To discover the relevant keywords to be complemented, the weights of those terms which do not occur in a document are considered missing instead of zero. Fuzzy sets are used to take care of the real-valued weights in the term vectors. Rough sets are then used to extract the potentially associated keywords which convey a concept for text retrieval in this incomplete information system. Finally, through incorporating Nearest Neighbor mechanism, the missing weights of the extracted keywords of a document can be filled by searching the corresponding weights of the most similar document. Thus, the documents in the original text dataset are expanded, whereas the number of total keywords is reduced. Some experiments are conducted using part of data from Ruters21578. Since the concept-based method is able to identify and supplement the potentially useful information to each document, the performance of information retrieval in terms of recall is greatly improved.
Original languageEnglish
Pages (from-to)699-707
Number of pages9
JournalLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volume3066
Publication statusPublished - 9 Dec 2004
Event4th International Conference, RSCTC 2004 - Uppsala, Sweden
Duration: 1 Jun 20045 Jun 2004

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this