MetaFast: Enabling Fast Metagenomic Classification via Seed Counting and Edit Distance Approximation
Abstract
Metagenomics, the study of genome sequences of diverse organisms cohabiting in a shared environment, has experienced significant advancements across various medical and biological fields. Metagenomic analysis is crucial, for instance, in clinical applications such as infectious disease screening and the diagnosis and early detection of diseases such as cancer. A key task in metagenomics is to determine the species present in a sample and their relative abundances. Currently, the field is dominated by either alignment-based tools, which offer high accuracy but are computationally expensive, or alignment-free tools, which are fast but lack the needed accuracy for many applications. In response to this dichotomy, we introduce MetaFast, a tool based on heuristics, to achieve a fundamental improvement in accuracy-runtime tradeoff over existing methods. MetaFast delivers accuracy comparable to the alignment-based and highly accurate tool Metalign but with significantly enhanced efficiency. In MetaFast, we accelerate memory-frugal reference database indexing and filtering. We further employ heuristics to accelerate read mapping. Our evaluation demonstrates that MetaFast achieves a 4x speedup over Metalign without compromising accuracy. MetaFast is publicly available on: https://github.com/CMU-SAFARI/MetaFast. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000642547Publication status
publishedJournal / series
arXivPages / Article No.
Publisher
Cornell UniversityEdition / version
v1Subject
Genomics (q-bio.GN); Hardware Architecture (cs.AR); Quantitative Methods (q-bio.QM)Organisational unit
09483 - Mutlu, Onur / Mutlu, Onur
Related publications and datasets
Is supplemented by: http://gigadb.org/dataset/100344
Is supplemented by: https://github.com/CMU-SAFARI/MetaFast
More
Show all metadata
ETH Bibliography
yes
Altmetrics