Show simple item record

dc.contributor.author
Robeson II, Michael S.
dc.contributor.author
O'Rourke, Devon R.
dc.contributor.author
Kaehler, Benjamin D.
dc.contributor.author
Ziemski, Michal
dc.contributor.author
Dillon, Matthew R.
dc.contributor.author
Foster, Jeffrey T.
dc.contributor.author
Bokulich, Nicholas
dc.date.accessioned
2021-12-21T09:55:01Z
dc.date.available
2021-12-06T17:30:21Z
dc.date.available
2021-12-21T09:55:01Z
dc.date.issued
2021
dc.identifier.issn
1553-734X
dc.identifier.issn
1553-7358
dc.identifier.other
10.1371/journal.pcbi.1009581
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/518920
dc.identifier.doi
10.3929/ethz-b-000518920
dc.description.abstract
Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
PLOS
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
dc.title
RESCRIPt: Reproducible sequence taxonomy reference database management
en_US
dc.type
Journal Article
dc.rights.license
Creative Commons Attribution 4.0 International
dc.date.published
2021-11-08
ethz.journal.title
PLoS Computational Biology
ethz.journal.volume
17
en_US
ethz.journal.issue
11
en_US
ethz.journal.abbreviated
PLOS comput. biol.
ethz.pages.start
e1009581
en_US
ethz.size
37 p.
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.identifier.wos
ethz.identifier.scopus
ethz.publication.place
San Francisco, CA
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02070 - Dep. Gesundheitswiss. und Technologie / Dep. of Health Sciences and Technology::02701 - Inst.f. Lebensmittelwiss.,Ernährung,Ges. / Institute of Food, Nutrition, and Health::09714 - Bokulich, Nicholas / Bokulich, Nicholas
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02070 - Dep. Gesundheitswiss. und Technologie / Dep. of Health Sciences and Technology::02701 - Inst.f. Lebensmittelwiss.,Ernährung,Ges. / Institute of Food, Nutrition, and Health::09714 - Bokulich, Nicholas / Bokulich, Nicholas
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02070 - Dep. Gesundheitswiss. und Technologie / Dep. of Health Sciences and Technology::02701 - Inst.f. Lebensmittelwiss.,Ernährung,Ges. / Institute of Food, Nutrition, and Health::09714 - Bokulich, Nicholas / Bokulich, Nicholas
ethz.date.deposited
2021-12-06T17:30:38Z
ethz.source
SCOPUS
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2021-12-21T09:55:09Z
ethz.rosetta.lastUpdated
2024-02-02T15:38:25Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=RESCRIPt:%20Reproducible%20sequence%20taxonomy%20reference%20database%20management&rft.jtitle=PLoS%20Computational%20Biology&rft.date=2021&rft.volume=17&rft.issue=11&rft.spage=e1009581&rft.issn=1553-734X&1553-7358&rft.au=Robeson%20II,%20Michael%20S.&O'Rourke,%20Devon%20R.&Kaehler,%20Benjamin%20D.&Ziemski,%20Michal&Dillon,%20Matthew%20R.&rft.genre=article&rft_id=info:doi/10.1371/journal.pcbi.1009581&
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record