Open access
Autor(in)
Datum
2021-03Typ
- Master Thesis
ETH Bibliographie
yes
Altmetrics
Abstract
Estimating 3D hand pose from a monocular RGB image is a challenging task. This is largelydue to the limited amount of available labeled data, as annotating images for 3D hand poserequires a complex multi-camera setup and a controlled lab-like setting. This in turn introducesa domain gap between the different hand pose datasets and the unconstrained settings of thereal world. In this thesis, we develop a self-supervised method to use unlabeled data from dif-ferent hand pose datasets to improve the accuracy of 3D hand pose estimation, and to bridgethe domain gap. We propose a novel contrastive learning framework for pose estimation, in-spired by the recent success of contrastive learning on image classification tasks. In a standardcontrastive learning framework, a model tries to learn a feature representation that is invariantunder any image augmentation. This can be beneficial, as the pose is invariant to appearancebased image augmentations. However, geometric augmentations (like rotation) change the poseequivariantly. However using geometric augmentations with contrastive self-supervision leadsto invariance. This can be detrimental to the pose estimation. We empirically show that thefeatures learned with our equivariant contrastive framework lead to more improvement whencompared to standard contrastive frameworks. Furthermore, we attain an improvement of7.6%in PA MKP-3D on FreiHAND with a standard ResNet-152, trained with additional unlabeleddata when compared to a fully supervised baseline. This enables us to achieve state-of-the-artperformance in a purely data driven way, without any task-specific specialized architecture. Mehr anzeigen
Persistenter Link
https://doi.org/10.3929/ethz-b-000484477Publikationsstatus
publishedVerlag
ETH ZurichThema
3D hand pose estimation; Self supervisionOrganisationseinheit
03979 - Hilliges, Otmar / Hilliges, Otmar
ETH Bibliographie
yes
Altmetrics