Open access
Author
Date
2023Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
The rise of mobile cameras has led to fundamental changes in photog- raphy, owing to their low cost, small size, and ease of use. However, the hardware constraints imposed on mobile cameras significantly limit the quality of their photos. Consequently, modern cameras rely on software technologies in order to improve the image quality. A promising direction is to combine information from multiple images to generate a higher quality image. We tackle this multi-frame image restoration problem in this thesis.
First, we introduce a novel architecture for the RAW burst super- resolution task. Our network takes multiple noisy RAW images as input, and generates a denoised, demosaicked, and super-resolved RGB image as output. In order to enable training and evaluation on real world data, we additionally collect the first burst super-resolution dataset, consisting of smartphone bursts and high-resolution DSLR reference. We demonstrate promising super-resolution performance on real world bursts, despite the presence of spatial and color mis- alignments in our training pairs.
Next, we propose a deep reparametrization of the maximum a posteriori (MAP) formulation commonly employed in multi-frame image restoration tasks. Our approach is derived by introducing a learned error metric and a latent representation of the target image, which transforms the MAP objective to a deep feature space. The deep reparametrization allows us to directly model the image formation process in the latent space, and to integrate learned image priors into the prediction.
Thirdly, we introduce a self-supervised training strategy for RAW burst super-resolution. Our approach utilizes only noisy low-resolution bursts for training, thereby eliminating the need to use sophisticated methods for collecting paired training data, or manually tuning synthetic pipelines. This is achieved by developing a novel self- iii supervised objective which can exploit the aliased high-frequency information present within a burst for training supervision.
Finally, we introduce a method to generate per-pixel segmentation masks for an object in a burst or a video. Our approach is not limited to segment a set of known object classes. Instead, it can learn to segment novel objects in a few-shot manner, given a single segmentation mask or a bounding box defining the object. We believe that such segmentation masks can serve as useful cues to improve restoration performance, specially in case of dynamic objects. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000627399Publication status
publishedExternal links
Search print copy at ETH Library
Contributors
Examiner: Van Gool, Luc
Examiner: Matas, Jiri
Examiner: Favaro, Paolo
Examiner: Danelljan, Martin
Publisher
ETH ZurichOrganisational unit
03514 - Van Gool, Luc (emeritus) / Van Gool, Luc (emeritus)
More
Show all metadata
ETH Bibliography
yes
Altmetrics