Vision for Autonomous Systems: From Tracking and Prediction to Quantum Computing
Open access
Author
Date
2024Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Autonomous systems strongly rely on computer vision to build a comprehensive model for understanding the environment they are embedded in. This task needs to be solved on multiple levels of abstraction, ranging from a high-level understanding of agent intentions to solving combinatorial problems for fundamental vision tasks. In this thesis, we focus on applications at three of these levels. On the most abstract level, we study the understanding of human intentions for autonomous driving and for a team of humanoid robots in structured environments. The foundation of this approach is multi-object tracking (MOT), which is subsequently investigated as one of the fundamental computer vision problems. Finally, on the lowest level of abstraction, we propose a quantum computing formulation of the matching problem our tracker is built on and further investigate the efficient use of an adiabatic quantum computer in computer vision.
In the first part of this thesis, the prediction of high-level actions of traffic participants in an autonomous driving scenario is studied. For this purpose, we develop a Hidden-Markov model representation that allows us to decode the sequence of actions from a vehicle's trajectory and a semantic map present in large-scale driving datasets. For predicting future driving maneuvers, we propose a convolutional neural network that fuses map information and observed trajectories using a rendered representation.
We subsequently approach human action recognition from the perspective of an autonomous robot in a structured environment. To enable this, we collect a referee action dataset that contains multiple domains to cater to the requirements of the task. By using simulated images, the dataset can be adapted easily to new actions, while two kinds of realistic domains allow us to adapt to real images with a reduced annotation effort. We develop a computationally efficient network to detect the actions and deploy it on the humanoid NAO robot.
In the second part, we propose a learnable online 3D MOT approach that uses a predictive model for traffic participants together with deep learning-based object matching. To enable this, we define a graph structure that merges both representations and uses neural message passing to match pairs of detection at different timesteps as well as detections with tracks. We furthermore propose a two-stage training approach that models inference within an online system, while avoiding the expensive rollout of online tracks. Overall, our method considerably improves track stability and performance.
After this, we further investigate improving long-term track stability on video sequences. This is done in the context of monitoring a fleet of robots from wide-angle cameras, where strong occlusions and identically looking robots pose a large challenge. We thus frame the task as a multi-platform sensor fusion approach, where tracklets from the external camera view are combined with measurements performed by the robots. The tracklets are combined into long-term tracks by solving a discrete quadratic problem that represents costs generated by different submodules. The cost weights are optimized using particle swarm optimization as a metaheuristic.
The third part of the thesis explores the application of quantum computing to challenging computer vision and machine learning tasks. We approach MOT with this paradigm by stating the matching and assignment problem as a task solvable on an adiabatic quantum computer (AQC). We further propose an iterative approach to represent and optimize the tracking constraints, for an improved solution probability. In simulation, we show that our approach is competitive with the state-of-the-art on commonly used MOT benchmarks. Using a D-Wave AQC, we demonstrate that small real-world problems can be solved on a quantum computer and provide an in-depth analysis of the properties of our approach using synthetic examples.
Finally, we approach the efficient use of an AQC for quantum computer vision and machine learning tasks. Starting from the perspective that many quantum computer vision applications are formulated as clustering tasks with additional constraints, we propose an approach that utilizes all measurements taken on an AQC to generate alternative high-quality clustering solutions. This uses the existing measurements to generate calibrated confidence scores for the solutions, with little additional compute cost. We validate our formulation with experiments in simulation and on a D-Wave AQC. Furthermore, we show that the set of solutions can be used to eliminate ambiguous points and that this approach also transfers to real data that does not strictly follow the assumptions of our derivation. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000673819Publication status
publishedExternal links
Search print copy at ETH Library
Contributors
Examiner: Van Gool, Luc
Examiner: Chin, Tat-Jun
Examiner: Wilmott, Colin
Examiner: Dai, Dengxin
Publisher
ETH ZurichSubject
Quantum computing; Computer vision; Tracking; Autonomous systems; RoboticsOrganisational unit
03514 - Van Gool, Luc / Van Gool, Luc
More
Show all metadata
ETH Bibliography
yes
Altmetrics