MMVAE+: Enhancing the Generative Quality of Multimodal VAEs without Compromises
Open access
Datum
2023Typ
- Conference Paper
ETH Bibliographie
yes
Altmetrics
Abstract
Multimodal VAEs have recently gained attention as efficient models for weakly-supervised generative learning with multiple modalities. However, all existing variants of multimodal VAEs are affected by a non-trivial trade-off between generative quality and generative coherence. In particular mixture-based models achieve good coherence only at the expense of sample diversity and a resulting lack of generative quality. We present a novel variant of the mixture-of-experts multimodal variational autoencoder that improves its generative quality, while maintaining high semantic coherence. We model shared and modality-specific information in separate latent subspaces, proposing an objective that overcomes certain dependencies on hyperparameters that arise for existing approaches with the same latent space structure. Compared to these existing approaches, we show increased robustness with respect to changes in the design of the latent space, in terms of the capacity allocated to modality-specific subspaces. We show that our model achieves both good generative coherence and high generative quality in challenging experiments, including more complex multimodal datasets than those used in previous works. Mehr anzeigen
Persistenter Link
https://doi.org/10.3929/ethz-b-000637761Publikationsstatus
publishedExterne Links
Buchtitel
The Eleventh International Conference on Learning RepresentationsVerlag
OpenReviewKonferenz
Thema
Multimodal Variational Autoencoder; Variational autoencoder; Multimodal Generative LearningOrganisationseinheit
02219 - ETH AI Center / ETH AI Center09670 - Vogt, Julia / Vogt, Julia
ETH Bibliographie
yes
Altmetrics