Show simple item record

dc.contributor.author
Song, Mingyang
dc.contributor.author
Zhang, Yang
dc.contributor.author
Aydın, Tunç O.
dc.contributor.editor
Avidan, Shai
dc.contributor.editor
Brostow, Gabriel
dc.contributor.editor
Cissé, Moustapha
dc.contributor.editor
Farinella, Giovanni Maria
dc.contributor.editor
Hassner, Tal
dc.date.accessioned
2023-02-28T07:39:36Z
dc.date.available
2023-02-24T04:49:51Z
dc.date.available
2023-02-28T07:39:36Z
dc.date.issued
2022
dc.identifier.isbn
978-3-031-19800-7
en_US
dc.identifier.isbn
978-3-031-19799-4
en_US
dc.identifier.issn
0302-9743
dc.identifier.issn
1611-3349
dc.identifier.other
10.1007/978-3-031-19800-7_28
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/600355
dc.description.abstract
Video denoising is a low-level vision task that aims to restore high quality videos from noisy content. Vision Transformer (ViT) is a new machine learning architecture that has shown promising performance on both high-level and low-level image tasks. In this paper, we propose a modified ViT architecture for video processing tasks, introducing a new training strategy and loss function to enhance temporal consistency without compromising spatial quality. Specifically, we propose an efficient hybrid Transformer-based model, TempFormer, which composes Spatio-Temporal Transformer Blocks (STTB) and 3D convolutional layers. The proposed STTB learns the temporal information between neighboring frames implicitly by utilizing the proposed Joint Spatio-Temporal Mixer module for attention calculation and feature aggregation in each ViT block. Moreover, existing methods suffer from temporal inconsistency artifacts that are problematic in practical cases and distracting to the viewers. We propose a sliding block strategy with recurrent architecture, and use a new loss term, Overlap Loss, to alleviate the flickering between adjacent frames. Our method produces state-of-the-art spatio-temporal denoising quality with significantly improved temporal coherency, and requires less computational resources to achieve comparable denoising quality with competing methods (Fig. 1).
en_US
dc.language.iso
en
en_US
dc.publisher
Springer
en_US
dc.subject
Video denoising
en_US
dc.subject
Transformer
en_US
dc.subject
Temporal consistency
en_US
dc.title
TempFormer: Temporally Consistent Transformer for Video Denoising
en_US
dc.type
Conference Paper
dc.date.published
2022-11-09
ethz.book.title
Computer Vision – ECCV 2022
en_US
ethz.journal.title
Lecture Notes in Computer Science
ethz.journal.volume
13679
en_US
ethz.journal.abbreviated
LNCS
ethz.pages.start
481
en_US
ethz.pages.end
496
en_US
ethz.event
17th European Conference on Computer Vision (ECCV 2022)
en_US
ethz.event.location
Tel Aviv, Israel
en_US
ethz.event.date
October 23-27, 2022
en_US
ethz.identifier.wos
ethz.publication.place
Cham
en_US
ethz.publication.status
published
en_US
ethz.date.deposited
2023-02-24T04:49:58Z
ethz.source
WOS
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2023-02-28T07:39:37Z
ethz.rosetta.lastUpdated
2023-02-28T07:39:37Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=TempFormer:%20Temporally%20Consistent%20Transformer%20for%20Video%20Denoising&rft.jtitle=Lecture%20Notes%20in%20Computer%20Science&rft.date=2022&rft.volume=13679&rft.spage=481&rft.epage=496&rft.issn=0302-9743&1611-3349&rft.au=Song,%20Mingyang&Zhang,%20Yang&Ayd%C4%B1n,%20Tun%C3%A7%20O.&rft.isbn=978-3-031-19800-7&978-3-031-19799-4&rft.genre=proceeding&rft_id=info:doi/10.1007/978-3-031-19800-7_28&rft.btitle=Computer%20Vision%20%E2%80%93%20ECCV%202022
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record