MMVAE+: Enhancing the Generative Quality of Multimodal VAEs without Compromises
Open access
Date
2023Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
Multimodal VAEs have recently gained attention as efficient models for weakly-supervised generative learning with multiple modalities. However, all existing variants of multimodal VAEs are affected by a non-trivial trade-off between generative quality and generative coherence. In particular mixture-based models achieve good coherence only at the expense of sample diversity and a resulting lack of generative quality. We present a novel variant of the mixture-of-experts multimodal variational autoencoder that improves its generative quality, while maintaining high semantic coherence. We model shared and modality-specific information in separate latent subspaces, proposing an objective that overcomes certain dependencies on hyperparameters that arise for existing approaches with the same latent space structure. Compared to these existing approaches, we show increased robustness with respect to changes in the design of the latent space, in terms of the capacity allocated to modality-specific subspaces. We show that our model achieves both good generative coherence and high generative quality in challenging experiments, including more complex multimodal datasets than those used in previous works. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000637761Publication status
publishedExternal links
Book title
The Eleventh International Conference on Learning RepresentationsPublisher
OpenReviewEvent
Subject
Multimodal Variational Autoencoder; Variational autoencoder; Multimodal Generative LearningOrganisational unit
02219 - ETH AI Center / ETH AI Center09670 - Vogt, Julia / Vogt, Julia
More
Show all metadata
ETH Bibliography
yes
Altmetrics