On Learning and Geometry for Visual Localization and Mapping

Sarlin, Paul-Edouard

doi:10.3929/ethz-b-000701148

Download

Full text (PDF, 49.72Mb)

Open access

Author

Sarlin, Paul-Edouard

Date

2024

Type

Doctoral Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 49.72Mb)

Rights / license

In Copyright - Non-Commercial Use Permitted

Abstract

Visual localization and mapping are important problems in Computer Vision with widespread use in many applications like Augmented Reality (AR) and Robotics. This problem has been extensively studied in the past decades, resulting in mature solutions based on correspondences across images, well-understood projective geometry, and 3D maps as sparse point clouds. Despite their complexity, such systems struggle with challenges that arise from real-world data. Deep learning offers a promising avenue to address these limitations and reach higher accuracy and robustness. One strain of research involves replacing specific components of the existing algorithms with Deep Neural Networks (DNNs). While this has led to notable performance improvements, it has also increased system complexity. Additionally, these gains are often constrained because the components are trained with proxy objectives that do not fully capture the ultimate goal of localization. Alternatively, some research has focused on developing simpler black-box DNNs trained end-to-end to replace these complex systems. They have the potential to learn stronger priors but have so far demonstrated limited generalization and interpretability. The balance between generalization and end-to-end training necessitates hybrid algorithms that effectively combine learning capacity with our existing knowledge of 3D geometry. In the first part of this thesis, we apply this hybrid design philosophy to the prevalent paradigm that is based on 3D maps. We introduce two new algorithms for mapping and localization, both based on the alignment of learned features across different views. To facilitate progress in this research area, we also introduce a new benchmark tailored for AR applications. In the second part, we explore the use of more compact and interpretable 2D maps also used by humans. We demonstrate that end-to-end training enables effectively learning to associate such maps with visual observations. We first develop a new algorithm for localizing images within a 2D semantic map. We then extend our approach to learn a new map representation optimized for visual localization. We introduce an algorithm to construct these 2D maps from visual inputs. Overall, this thesis makes a significant step towards localization and mapping algorithms that integrate robust data-driven priors about the real world. Show more

Permanent link

https://doi.org/10.3929/ethz-b-000701148

Publication status

published

External links

Search print copy at ETH Library

Contributors

Examiner: Pollefeys, Marc
Examiner: Cremers, Daniel
Examiner: Snavely, Noah
Examiner: Malisiewicz, Tomasz

Publisher

ETH Zurich

Subject

computer vision; machine learning; 3D geometry

Organisational unit

03766 - Pollefeys, Marc / Pollefeys, Marc

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

On Learning and Geometry for Visual Localization and Mapping Mendeley CSV RIS BibTeX

On Learning and Geometry for Visual Localization and Mapping

Mendeley

CSV

RIS

BibTeX