Open access
Author
Date
2024Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Super-resolution (SR) aims at restoring high-resolution (HR) images or videos from low-resolution (LR) counterparts. Recently, the rise of deep learning has significantly advanced SR, enabling impressive real-world applications through deep neural networks. Despite tremendous progress, SR faces critical challenges including integrating cross-information, scaling cross-scale image resolutions effectively, handling cross-degradations simultaneously, and extending to cross-dimensions. These challenges manifest in practical issues such as limited information for restoration, lack of generalization to out-of-scale images, difficulties in addressing multiple kinds of degradations, and limited real-world applicability in video SR. To address these challenges and issues, we propose the following SR methods in the dissertation.
Firstly, for the cross-information challenge, we propose a deformable attention Transformer, namely DATSR, to exploit more information from reference images. The method consists of a texture feature encoder (TFE) module, a reference-based deformable attention (RDA) module and a residual feature aggregation (RFA) module. Specifically, TFE first extracts image transformation (e.g., brightness, contrast and hue) insensitive features for LR and Ref images, then RDA exploits multiple relevant textures to compensate more information for LR features, and last RFA aggregates LR features and relevant textures to get a more visually pleasant result. Extensive experiments demonstrate that more information helps improve the SR performance, and our DATSR achieves state-of-the-art performance on benchmark datasets.
Secondly, we propose a continuous implicit attention-in-attention network for SR, called CiaoSR, to address the cross-scale challenge. Specifically, we explicitly design an implicit attention network to learn the ensemble weights for the nearby local features. Furthermore, we embed scale-aware attention in this network to exploit additional non-local information. Extensive experiments on benchmark datasets demonstrate that CiaoSR achieves state-of-the-art performance on the arbitrary-scale SR task. More importantly, our CiaoSR can be flexibly integrated into any backbone to improve cross-scale performance.
Thirdly, to tackle the cross-degradation challenge, we propose a diffusion model-based image restoration (IR) method through a deep equilibrium fixed point system, called DeqIR. Specifically, we first formulate some IR tasks as linear inverse problems. Existing diffusion methods solve the inverse problems using long sequential sampling chains, resulting in expensive sampling time and high computation costs. To address this, we derive an analytical solution by modeling the entire sampling chain as a joint multivariate fixed point system. Based on the analytical solution, we can conduct parallel sampling and restore high-quality images without training. Extensive experiments demonstrate our method is able to generalize well on different degradations in typical IR tasks and real-world settings.
Lastly, for the cross-dimension challenge, we further extend the image SR method to a cross-dimension application, i.e., a practical space-time video SR task. We propose a new method by leveraging both model-based and learning-based methods. Specifically, we first formulate this task as a joint video deblurring, frame interpolation, and super-resolution problem, and solve it as two sub-problems in an alternate way. For the first sub-problem, we derive an interpretable analytical solution and then formulate it as a Fourier data transform layer. Then, we propose a recurrent video enhancement layer for the second sub-problem to recover high-frequency details. Extensive experiments demonstrate our method has a successful application on the practical space-time video SR task and achieves superior performance.
All in all, this dissertation contributes to image and video SR, achieving state-of-the-art performance on benchmark datasets. We believe that our proposed SR methods have broad applications, including entertainment (e.g., old films or photo restoration), smartphones, digital cameras, medical imaging, video conferencing, and video games, etc. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000705724Publication status
publishedExternal links
Search print copy at ETH Library
Contributors
Examiner: Van Gool, Luc
Examiner: Yang, Ming-Hsuan
Examiner: Alahi, Alexandre
Examiner: Timofte, Radu
Publisher
ETH ZurichSubject
Super-Resolution; Deep Learning; Image Restoration; Video restoration; Deep Neural Network; Low-level VisionOrganisational unit
03514 - Van Gool, Luc (emeritus) / Van Gool, Luc (emeritus)
More
Show all metadata
ETH Bibliography
yes
Altmetrics