Open access
Author
Date
2023Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Reinforcement Learning (RL) is a powerful framework for decision making and adaptive learning by interaction. Even though at its core it consists of trial-and-error learning, it has become a critical tool for research on Artificial Intelligence (AI). In the last decade, RL algorithms have been able to master strategic games like Chess and Go, and control a variety of robotic and industrial platforms, from locomotion and manipulation to power plants, and even nuclear fusion reactors. By incorporating deep neural networks (NN) as function approximators, “deep RL” reached the ability to handle high-dimensional state and action spaces, and have in principle better generalization across tasks, making RL solutions versatile and promising. However, using deep neural networks comes with certain caveats. RL algorithms often face issues such as brittleness due to overfitting and sensitivity to hyperparameters, which come on top of the typical RL challenges, such as low sample efficiency, difficulty in handling sparse rewards, delayed credit assignment for long-horizon tasks, sensitivity to reward function design. In this dissertation, we present a series of novel contributions that address some of the problems faced by RL, with the ultimate goal of improving its efficiency, robustness, and generalization for continuous control tasks. Specifically, we will present more robust approaches to trajectory optimization, coupled with NN function approximation for policy learning, model learning, and reward learning. In particular, the majority of this work is centered around zero-order optimization for model-predictive control, which we demonstrate to be more performing, robust, and reproducible than gradient-based trajectory optimizers. Throughout this dissertation, we will show how zero-order optimization can be used to efficiently solve tasks with sparse rewards, how it can be used in the context of imitation learning, and how it can be exploited in conjunction with model learning for uncertainty propagation. Finally, we will present a method to learn reward functions from scratch, in a purely self-supervised fashion. Through extensive experiments in simulated environments, our methods demonstrate significant improvements in learning efficiency and performance, reducing the required number of interactions with the environment while still achieving near-optimal solutions. This work aims to provide a viable approach to tackle part of the challenges of deep RL, addressing the efficiency and robustness of the learning process without relying on predefined expert knowledge. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000644529Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
model based reinforcement learning; Reinforcement Learning; Model Predictive Control; Trajectory OptimizationOrganisational unit
03908 - Krause, Andreas / Krause, Andreas
More
Show all metadata
ETH Bibliography
yes
Altmetrics