Accurate taxonomic assignment of short pyrosequencing reads

José C. Clemente, Jesper Andreas Jansson, Gabriel Valiente

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

15 Citations (Scopus)

Abstract

Ambiguities in the taxonomy dependent assignment of pyrosequencing reads are usually resolved by mapping each read to the lowest common ancestor in a reference taxonomy of all those sequences that match the read. This conservative approach has the drawback of mapping a read to a possibly large clade that may also contain many sequences not matching the read. A more accurate taxonomic assignment of short reads can be made by mapping each read to the node in the reference taxonomy that provides the best precision and recall. We show that given a suffix array for the sequences in the reference taxonomy, a short read can be mapped to the node of the reference taxonomy with the best combined value of precision and recall in time linear in the size of the taxonomy subtree rooted at the lowest common ancestor of the matching sequences. An accurate taxonomic assignment of short reads can thus be made with about the same efficiency as when mapping each read to the lowest common ancestor of all matching sequences in a reference taxonomy. We demonstrate the effectiveness of our approach on several metagenomic datasets of marine and gut microbiota. Pte. Ltd.
Original languageEnglish
Title of host publicationPacific Symposium on Biocomputing 2010, PSB 2010
Pages3-9
Number of pages7
Publication statusPublished - 1 Dec 2010
Externally publishedYes
Event15th Pacific Symposium on Biocomputing, PSB 2010 - Kamuela, HI, United States
Duration: 4 Jan 20108 Jan 2010

Conference

Conference15th Pacific Symposium on Biocomputing, PSB 2010
Country/TerritoryUnited States
CityKamuela, HI
Period4/01/108/01/10

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Biomedical Engineering
  • Medicine(all)

Cite this