TempFormer: Temporally Consistent Transformer for Video Denoising

Song, Mingyang; Zhang, Yang; Aydın, Tunç O.

doi:10.1007/978-3-031-19800-7_28

Show simple item record

dc.contributor.author

Song, Mingyang

dc.contributor.author

Zhang, Yang

dc.contributor.author

Aydın, Tunç O.

dc.contributor.editor

Avidan, Shai

dc.contributor.editor

Brostow, Gabriel

dc.contributor.editor

Cissé, Moustapha

dc.contributor.editor

Farinella, Giovanni Maria

dc.contributor.editor

Hassner, Tal

dc.date.accessioned

2023-02-28T07:39:36Z

dc.date.available

2023-02-24T04:49:51Z

dc.date.available

2023-02-28T07:39:36Z

dc.date.issued

2022

dc.identifier.isbn

978-3-031-19800-7

en_US

dc.identifier.isbn

978-3-031-19799-4

en_US

dc.identifier.issn

0302-9743

dc.identifier.issn

1611-3349

dc.identifier.other

10.1007/978-3-031-19800-7_28

en_US

dc.identifier.uri

http://hdl.handle.net/20.500.11850/600355

dc.description.abstract

Video denoising is a low-level vision task that aims to restore high quality videos from noisy content. Vision Transformer (ViT) is a new machine learning architecture that has shown promising performance on both high-level and low-level image tasks. In this paper, we propose a modified ViT architecture for video processing tasks, introducing a new training strategy and loss function to enhance temporal consistency without compromising spatial quality. Specifically, we propose an efficient hybrid Transformer-based model, TempFormer, which composes Spatio-Temporal Transformer Blocks (STTB) and 3D convolutional layers. The proposed STTB learns the temporal information between neighboring frames implicitly by utilizing the proposed Joint Spatio-Temporal Mixer module for attention calculation and feature aggregation in each ViT block. Moreover, existing methods suffer from temporal inconsistency artifacts that are problematic in practical cases and distracting to the viewers. We propose a sliding block strategy with recurrent architecture, and use a new loss term, Overlap Loss, to alleviate the flickering between adjacent frames. Our method produces state-of-the-art spatio-temporal denoising quality with significantly improved temporal coherency, and requires less computational resources to achieve comparable denoising quality with competing methods (Fig. 1).

en_US

dc.language.iso

en

en_US

dc.publisher

Springer

en_US

dc.subject

Video denoising

en_US

dc.subject

Transformer

en_US

dc.subject

Temporal consistency

en_US

dc.title

TempFormer: Temporally Consistent Transformer for Video Denoising

en_US

dc.type

Conference Paper

dc.date.published

2022-11-09

ethz.book.title

Computer Vision – ECCV 2022

en_US

ethz.journal.title

Lecture Notes in Computer Science

ethz.journal.volume

13679

en_US

ethz.journal.abbreviated

LNCS

ethz.pages.start

481

en_US

ethz.pages.end

496

en_US

ethz.event

17th European Conference on Computer Vision (ECCV 2022)

en_US

ethz.event.location

Tel Aviv, Israel

en_US

ethz.event.date

October 23-27, 2022

en_US

ethz.identifier.wos

000904246400028

ethz.publication.place

Cham

en_US

ethz.publication.status

published

en_US

ethz.date.deposited

2023-02-24T04:49:58Z

ethz.source

WOS

ethz.eth

yes

en_US

ethz.availability

Metadata only

en_US

ethz.rosetta.installDate

2023-02-28T07:39:37Z

ethz.rosetta.lastUpdated

2023-02-28T07:39:37Z

ethz.rosetta.versionExported

true

ethz.COinS

ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=TempFormer:%20Temporally%20Consistent%20Transformer%20for%20Video%20Denoising&rft.jtitle=Lecture%20Notes%20in%20Computer%20Science&rft.date=2022&rft.volume=13679&rft.spage=481&rft.epage=496&rft.issn=0302-9743&1611-3349&rft.au=Song,%20Mingyang&Zhang,%20Yang&Ayd%C4%B1n,%20Tun%C3%A7%20O.&rft.isbn=978-3-031-19800-7&978-3-031-19799-4&rft.genre=proceeding&rft_id=info:doi/10.1007/978-3-031-19800-7_28&rft.btitle=Computer%20Vision%20%E2%80%93%20ECCV%202022

Search print copy at ETH Library

Files in this item

Files	Size	Format	Open in viewer
There are no files associated with this item.

Publication type

Conference Paper [35322]

Show simple item record

Research Collection

Search

TempFormer: Temporally Consistent Transformer for Video Denoising Mendeley CSV RIS BibTeX

Files in this item

Publication type

TempFormer: Temporally Consistent Transformer for Video Denoising

Mendeley

CSV

RIS

BibTeX