Matching single cells across modalities with contrastive learning and optimal transport
Abstract
Understanding the interactions between the biomolecules that govern cellular behaviors remains an emergent question in biology. Recent advances in single-cell technologies have enabled the simultaneous quantification of multiple biomolecules in the same cell, opening new avenues for understanding cellular complexity and heterogeneity. Still, the resulting multimodal single-cell datasets present unique challenges arising from the high dimensionality and multiple sources of acquisition noise. Computational methods able to match cells across different modalities offer an appealing alternative towards this goal. In this work, we propose MatchCLOT, a novel method for modality matching inspired by recent promising developments in contrastive learning and optimal transport. MatchCLOT uses contrastive learning to learn a common representation between two modalities and applies entropic optimal transport as an approximate maximum weight bipartite matching algorithm. Our model obtains state-of-the-art performance on two curated benchmarking datasets and an independent test dataset, improving the top scoring method by 26.1% while preserving the underlying biological structure of the multimodal data. Importantly, MatchCLOT offers high gains in computational time and memory that, in contrast to existing methods, allows it to scale well with the number of cells. As single-cell datasets become increasingly large, MatchCLOT offers an accurate and efficient solution to the problem of modality matching. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000620420Publication status
publishedExternal links
Journal / series
Briefings in BioinformaticsVolume
Pages / Article No.
Publisher
Oxford University PressSubject
Single-cell data integration; Modality matching; Contrastive learning; Optimal transportMore
Show all metadata