Fast relative Lempel-Ziv self-index for similar sequences

Huy Hoang Do, Jesper Andreas Jansson, Kunihiko Sadakane, Wing Kin Sung

Research output: Journal article publicationJournal articleAcademic researchpeer-review

31 Citations (Scopus)

Abstract

Recent advances in biotechnology and web technology are continuously generating huge collections of similar strings. People now face the problem of storing them compactly while supporting fast pattern searching. One compression scheme called relative Lempel-Ziv compression uses textual substitutions from a reference text to represent each string in S as a concatenation of substrings from a reference string R. This basic scheme gives a good compression ratio when every string in S is similar to R, but does not provide any pattern searching functionality. Here, we describe a new data structure based on relative Lempel-Ziv compression that is space-efficient and also supports fast pattern searching.
Original languageEnglish
Pages (from-to)14-30
Number of pages17
JournalTheoretical Computer Science
Volume532
DOIs
Publication statusPublished - 1 Jan 2014
Externally publishedYes

Keywords

  • Exact pattern searching
  • FM-index
  • Rank and select
  • String decomposition
  • Suffix range
  • Textual substitution

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this