Abstract
The emergence of neural networks has revolutionized the field of motion synthesis. Yet, learning to unconditionally synthesize motions from a given distribution remains challenging, especially when the motions are highly diverse. In this work, we present MoDi - a generative model trained in an unsupervised setting from an extremely diverse, unstructured and unlabeled dataset. During inference, MoDi can synthesize high-quality, diverse motions. Despite the lack of any structure in the dataset, our model yields a well-behaved and highly structured latent space, which can be semantically clustered, constituting a strong motion prior that facilitates various applications including semantic editing and crowd animation. In addition, we present an encoder that inverts real motions into MoDi's natural motion manifold, issuing solutions to various ill-posed challenges such as completion from prefix and spatial editing. Our qualitative and quantitative experiments achieve state-of-the-art results that outperform recent SOTA techniques. Code and trained models are available at https://sigal-raab.github.io/MoDi. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000629238Publication status
publishedExternal links
Book title
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Pages / Article No.
Publisher
IEEEEvent
Subject
Humans: Face, body, pose, gesture, movementOrganisational unit
03911 - Sorkine Hornung, Olga / Sorkine Hornung, Olga
Related publications and datasets
Is new version of: https://doi.org/10.48550/arXiv.2206.08010
More
Show all metadata
ETH Bibliography
yes
Altmetrics