Show simple item record

dc.contributor.author
Karasikov, Mikhail
dc.contributor.author
Mustafa, Harun
dc.contributor.author
Joudaki, Amir
dc.contributor.author
Javadzadeh No, Sara
dc.contributor.author
Rätsch, Gunnar
dc.contributor.author
Kahles, André
dc.date.accessioned
2020-01-24T08:37:31Z
dc.date.available
2019-01-15T08:09:09Z
dc.date.available
2020-01-24T08:37:31Z
dc.date.issued
2018
dc.identifier.other
10.1101/468512
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/315747
dc.identifier.doi
10.3929/ethz-b-000314581
dc.description.abstract
High-throughput DNA sequencing data is accumulating in public repositories, and efficient approaches for storing and indexing such data are in high demand. In recent research, several graph data structures have been proposed to represent large sets of sequencing data and allow for efficient query of sequences. In particular, the concept of colored de Bruijn graphs has been explored by several groups. While there has been good progress towards representing the sequence graph in small space, methods for storing a set of labels on top of such graphs are still not sufficiently explored. It is also currently not clear how characteristics of the input data, such as the sparsity and correlations of labels, can help to inform the choice of method to compress the labels. In this work, we present a systematic analysis of five different state-of-the-art annotation compression schemes that evaluates key metrics on both artificial and real-world data and discusses how different data characteristics influence the compression performance. In addition, we present a new approach, Multi-BRWT, that shows an up to 50% improvement in compression performance over the current state-of-the-art and is adaptive to different kinds of input data. Using our comprehensive test datasets, we show that this improvement can be robustly reproduced for different representative real-world datasets.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Cold Spring Harbor Laboratory
en_US
dc.rights.uri
http://creativecommons.org/licenses/by-nc/4.0/
dc.subject
Sparse binary matrices
en_US
dc.subject
Binary relations
en_US
dc.subject
Genome graph annotation
en_US
dc.title
Sparse Binary Relation Representations for Genome Graph Annotation
en_US
dc.type
Working Paper
dc.rights.license
Creative Commons Attribution-NonCommercial 4.0 International
dc.date.published
2018-11-12
ethz.journal.title
bioRxiv
ethz.size
10 p.
en_US
ethz.grant
Scalable Genome Graph Data Structures for Metagenomics and Genome Annotation
en_US
ethz.publication.place
Cold Spring Harbor, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09568 - Rätsch, Gunnar / Rätsch, Gunnar
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09568 - Rätsch, Gunnar / Rätsch, Gunnar
ethz.grant.agreementno
167331
ethz.grant.agreementno
167331
ethz.grant.fundername
SNF
ethz.grant.fundername
SNF
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.program
NFP 75: Gesuch
ethz.relation.isPreviousVersionOf
10.3929/ethz-b-000393658
ethz.date.deposited
2019-01-09T15:02:53Z
ethz.source
BATCH
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2019-01-15T08:09:28Z
ethz.rosetta.lastUpdated
2023-02-06T18:13:21Z
ethz.rosetta.versionExported
true
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/314780
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/314581
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Sparse%20Binary%20Relation%20Representations%20for%20Genome%20Graph%20Annotation&rft.jtitle=bioRxiv&rft.date=2018&rft.au=Karasikov,%20Mikhail&Mustafa,%20Harun&Joudaki,%20Amir&Javadzadeh%20No,%20Sara&R%C3%A4tsch,%20Gunnar&rft.genre=preprint&rft_id=info:doi/10.1101/468512&
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record