Open access
Author
Date
2024Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Recent advancements in generative modeling have transformed visual content creation, showing tremendous promise in several applications in Computer Vision and Graphics. However, the adoption of generative models in everyday tasks is hindered by challenges in controllability of the generation process, data requirements, and computational demands. This thesis focuses on addressing such real-world constraints in 2D and 3D generative models.
Firstly, we focus on improving the data efficiency of class-conditional Generative Adversarial Networks (GANs) using transfer learning. We introduce a new class-specific transfer learning method, called cGANTransfer, to explicitly propagate the knowledge from old classes to the new ones based on their relevance. Through extensive evaluation, we demonstrate the superiority of the proposed approach over the previous methods for conditional GAN transfer.
Secondly, we investigate the training of class-conditional GANs with small datasets. In particular, we identify conditioning collapse in GANs--mode collapse caused by conditional GAN training on small data. We propose a training strategy based on transitional conditioning that effectively prevents the observed mode collapse by additionally leveraging unconditional learning. The proposed method results not only in stable training but also in generating high-quality images, thanks to the exploitation of shared information across classes in the early stages of training.
Thirdly, we tackle the computational efficiency of NeRF-GANs, a class of 3D-aware generative models based on the integration of Neural Radiance Fields (NeRFs) and GANs, trained on single-view image datasets. Specifically, we revisit pose-conditioned 2D GANs for efficient 3D-aware generation at inference time by distilling 3D knowledge from pretrained NeRF-GANs. We propose a simple and effective method for efficient inference of 3D-aware GANs, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network, to directly generate 3D-consistent images corresponding to the underlying 3D representations.
Lastly, we address the novel task of object generation in 3D scenes without the need for any 3D supervision or 3D placement guidance from the users. We introduce InseRF, a novel method for generative object insertion in the NeRF reconstructions of 3D scenes. Based on a user-provided textual description and only a 2D bounding box in a reference viewpoint, InseRF is capable of controllable and 3D-consistent object insertion in 3D scenes without requiring explicit 3D information as input. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000691713Publication status
publishedExternal links
Search print copy at ETH Library
Contributors
Examiner: Van Gool, Luc
Examiner: Zhu, Jun-Yan
Examiner: Aila, Timo
Examiner: Khoreva, Anna
Examiner: Paudel, Danda Pani
Publisher
ETH ZurichSubject
Computer Vision; Generative AIOrganisational unit
03514 - Van Gool, Luc (emeritus) / Van Gool, Luc (emeritus)
More
Show all metadata
ETH Bibliography
yes
Altmetrics