Compressing population DNA sequences using multiple reference sequences

Kin On Cheng, Ngai Fong Law, Wan Chi Siu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Compressing population DNA sequences often relies on the use of a reference sequence so that only the differences between the target DNA sequences to be compressed and the reference sequence are encoded. Despite the importance of the choice of the reference sequence, state-of-the-art algorithms in population sequence compression often selected one of the population sequences as a reference sequence in an ad hoc manner. In this paper, we investigated issues about the choice of the reference sequence. In particular, population sequences are first clustered into a number of groups. A reference sequence is then obtained for each group so that substructures within each group can be characterized by this reference sequence. Afterwards, the reference sequence is used to compress sequences within that group. In this way, the multiple reference sequences framework can optimize the overall compression performance on the set of population sequences. Results show that our proposed method reduces the compressed size by up to 91% as compared to state-of-the-art reference- based approaches.

Original languageEnglish
Title of host publicationProceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages760-764
Number of pages5
ISBN (Electronic)9781538615423
DOIs
Publication statusPublished - 5 Feb 2018
Event9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 - Kuala Lumpur, Malaysia
Duration: 12 Dec 201715 Dec 2017

Publication series

NameProceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
Volume2018-February

Conference

Conference9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
Country/TerritoryMalaysia
CityKuala Lumpur
Period12/12/1715/12/17

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Compressing population DNA sequences using multiple reference sequences'. Together they form a unique fingerprint.

Cite this