Scalable, statistical storage allocation for extensible inverted file construction

Research output: Journal article publicationJournal articleAcademic researchpeer-review

5 Citations (Scopus)

Abstract

An Inverted file is a commonly used index for both archival databases and free text where no updates are expected. Applications like information filtering and dynamic environments like the Internet require inverted files to be updated efficiently. Recently, extensible inverted files are proposed which can be used for fast online indexing. The effective storage allocation scheme for such inverted files uses the arrival rate to preallocate storage. In this article, this storage allocation scheme is improved by using information about both the arrival rates and their variability to predict the storage needed, as well as scaling the storage allocation by a logarithmic factor. The resultant, final storage utilization rate can be as high as 97-98% after indexing about 1.6 million documents. This compares favorably with the storage utilization rate of the original arrival rate storage allocation scheme. Our evaluation shows that the retrieval time for extensible inverted file on solid state disk is on average similar to the retrieval time for in-memory extensible inverted file. When file seek time is not an issue, our scalable storage allocation enables extensible inverted files to be used as the main index on disk. Our statistical storage allocation may be applicable to novel situations where the arrival of items follows a binomial, Poisson or normal distribution.
Original languageEnglish
Pages (from-to)1082-1088
Number of pages7
JournalJournal of Systems and Software
Volume84
Issue number7
DOIs
Publication statusPublished - 1 Jul 2011

Keywords

  • Inverted file
  • Optimization
  • Storage allocation

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this