Collaborative Visual-Inertial State and Scene Estimation

Karrer, Marco

doi:10.3929/ethz-b-000465334

Show simple item record

dc.contributor.author

Karrer, Marco

dc.contributor.supervisor

Chli, Margarita

dc.contributor.supervisor

Grisetti, Giorgio

dc.contributor.supervisor

Pollefeys, Marc

dc.date.accessioned

2021-01-25T13:21:04Z

dc.date.available

2021-01-25T12:39:03Z

dc.date.available

2021-01-25T13:21:04Z

dc.date.issued

2020-09

dc.identifier.uri

http://hdl.handle.net/20.500.11850/465334

dc.identifier.doi

10.3929/ethz-b-000465334

dc.description.abstract

The capability of a robot to create a map of its workspace on the fly, while constantly updating it and continuously estimating its motion in it, constitutes one of the central research problems in mobile robotics and is referred to as Simultaneous Localization And Mapping (SLAM) in the literature. Relying solely on the sensor-suite onboard the robot, SLAM is a core building block in enabling the navigational autonomy necessary to facilitate the general use of mobile robots and has been the subject of booming research interest spanning over three decades. With the largest body of related literature addressing the challenge of single-agent SLAM, it is only very recently, with the relative maturity of this field that approaches tackling collaborative SLAM with multiple agents have started appearing. The potential of collaborative multi-agent SLAM is great; not only promising to boost the efficiency of robotic missions by splitting the task at hand to more agents but also to improve the overall robustness and accuracy by boosting the amount of data that each agent’s estimation process has access to. While SLAM can be performed using a variety of different sensors, this thesis is focused on the fusion of visual and inertial cues, as one of the most common combinations of sensing modalities in robotics today. The information richness captured by cameras, along with the high-frequency and metric information provided by Inertial Measurement Units (IMUs) in combination with the low weight and power consumption offered by a visual-inertial sensor suite render this setup ideal for a wide variety of applications and robotic platforms, in particular to resource-constrained platforms such as Unmanned Aerial Vehicles (UAVs). The majority of the state-of-the-art visual-inertial estimators are designed as odometry algorithms, providing only estimates consistent within a limited time-horizon. This lack in global consistency of estimates, however, poses a major hurdle in an effective fusion of data from multiple agents and the practi- cal definition of a common reference frame, which is imperative before collaborative effort can be coordinated. In this spirit, this thesis investigates the potential of global optimization, based on a central access point (server) as a first approach, demonstrating global consistency using only monocular-inertial data. Fusing data from multiple agents, not only consistency can be maintained, but also the accuracy is shown to improve at times, revealing the great potential of collaborative SLAM. Aiming at improving the computational efficiency, in a second approach a more efficient system architecture is employed, allowing a more suitable distribution of the computational load amongst the agents and the server. Furthermore, the architecture implements a two-way communication enabling a tighter collaboration between the agents as they become capable of re-using information captured by other agents through communication with the server, enabling improvements of their onboard pose tracking online, during the mission. In addition to general collaborative SLAM without specific assumptions on the agents’ relative pose configuration, we investigate the potential of a configuration with two agents, carrying one camera each with overlapping fields of view, essentially forming a virtual stereo camera. With the ability of each robotic agent to move independently, the potential to control the stereo baseline according to the scene depth is very promising, for example at high altitudes where all scene points are far away and, therefore, only provide weak constraints on the metric scale in a standard single-agent system. To this end, an approach to estimate the time-varying stereo transformation formed between two agents is proposed, by fusing the egomotion estimates of the individual agents along with the image measurements extracted from the view-overlap in a tightly coupled fashion. Taking this virtual stereo camera idea a step further, a novel collaboration framework is presented, utilizing the view-overlap along with relative distance measurements across the two agents (e.g. obtained via Ultra-Wide Band (UWB) modules), in order to successfully perform state estimation at high altitudes where state-of-the-art single-agent methods fail. In the interest of low-latency pose estimation, each agent holds its own estimate of the map, while consistency between the agents is achieved using a novel consensus-based sliding window bundle adjustment. Despite that in this work, experiments are shown in a two-agent setup, the proposed distributed bundle adjustment scheme holds great potential for scaling up to larger problems with multiple agents, due to the asynchronicity of the proposed estimation process and the high level of parallelism it permits. The majority of the developed approaches in this thesis rely on sparse feature maps in order to allow for efficient and timely pose estimation, however, this translates to reduced awareness of the spatial structure of a robot’s workspace, which can be insufficient for tasks requiring careful scene interaction and manipulation of objects. Equipping a typical visual-inertial sensor suite with an RGB-D camera, an add-on framework is presented that enables the efficient fusion of naturally noisy depth information into an accurate, local, dense map of the scene, providing sufficient information for an agent to plan contact with a surface. With the focus on collaborative SLAM using visual-inertial data, the approaches and systems presented in this thesis contribute towards achieving collaborative Visual-Inertial SLAM (VI-SLAM) deployable in challenging real-world scenarios, where the participating agents’ experiences get fused and processed at a central access point. On the other side, it is shown that taking advantage of specific configurations can push the collaboration amongst the agents towards achieving greater general robustness and accuracy of scene and egomotion estimates in scenarios, where state-of-the-art single-agent systems are otherwise unsuccessful, paving the way towards intelligent robot collaboration.

en_US

dc.format

application/pdf

en_US

dc.language.iso

en

en_US

dc.publisher

ETH Zurich

en_US

dc.rights.uri

http://rightsstatements.org/page/InC-NC/1.0/

dc.subject

Computer vision

en_US

dc.subject

Multi-agent systems

en_US

dc.subject

SLAM

en_US

dc.subject

Collaborative SLAM

en_US

dc.subject

Robotics

en_US

dc.title

Collaborative Visual-Inertial State and Scene Estimation

en_US

dc.type

Doctoral Thesis

dc.rights.license

In Copyright - Non-Commercial Use Permitted

dc.date.published

2021-01-25

ethz.size

151 p.

en_US

ethz.code.ddc

DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science

en_US

ethz.identifier.diss

27042

en_US

ethz.publication.place

Zurich

en_US

ethz.publication.status

published

en_US

ethz.leitzahl

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02130 - Dep. Maschinenbau und Verfahrenstechnik / Dep. of Mechanical and Process Eng.::02620 - Inst. f. Robotik u. Intelligente Systeme / Inst. Robotics and Intelligent Systems::09559 - Chli, Margarita (ehemalig) / Chli, Margarita (former)

en_US

ethz.leitzahl.certified

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02130 - Dep. Maschinenbau und Verfahrenstechnik / Dep. of Mechanical and Process Eng.::02620 - Inst. f. Robotik u. Intelligente Systeme / Inst. Robotics and Intelligent Systems::09559 - Chli, Margarita (ehemalig) / Chli, Margarita (former)

en_US

ethz.date.deposited

2021-01-25T12:39:10Z

ethz.source

FORM

ethz.eth

yes

en_US

ethz.availability

Open access

en_US

ethz.rosetta.installDate

2021-01-25T13:21:13Z

ethz.rosetta.lastUpdated

2023-02-06T21:21:38Z

ethz.rosetta.versionExported

true

ethz.COinS

ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Collaborative%20Visual-Inertial%20State%20and%20Scene%20Estimation&rft.date=2020-09&rft.au=Karrer,%20Marco&rft.genre=unknown&rft.btitle=Collaborative%20Visual-Inertial%20State%20and%20Scene%20Estimation

Search print copy at ETH Library

Files in this item

Name:: doctoral_thesis_marco_karrer.pdf
Size:: 19.22Mb
Format:: Adobe PDF
Label:: Full text

Download

Publication type

Doctoral Thesis [30357]

Show simple item record

Research Collection

Search

Collaborative Visual-Inertial State and Scene Estimation Mendeley CSV RIS BibTeX

Files in this item

Publication type

Collaborative Visual-Inertial State and Scene Estimation

Mendeley

CSV

RIS

BibTeX