Sample-efficient Model-based Reinforcement Learning

Pinneri, Cristina

doi:10.3929/ethz-b-000644529

Download

Full text (PDF, 14.67Mb)

Open access

Author

Pinneri, Cristina

Date

2023

Type

Doctoral Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 14.67Mb)

Rights / license

In Copyright - Non-Commercial Use Permitted

Abstract

Reinforcement Learning (RL) is a powerful framework for decision making and adaptive learning by interaction. Even though at its core it consists of trial-and-error learning, it has become a critical tool for research on Artificial Intelligence (AI). In the last decade, RL algorithms have been able to master strategic games like Chess and Go, and control a variety of robotic and industrial platforms, from locomotion and manipulation to power plants, and even nuclear fusion reactors. By incorporating deep neural networks (NN) as function approximators, “deep RL” reached the ability to handle high-dimensional state and action spaces, and have in principle better generalization across tasks, making RL solutions versatile and promising. However, using deep neural networks comes with certain caveats. RL algorithms often face issues such as brittleness due to overfitting and sensitivity to hyperparameters, which come on top of the typical RL challenges, such as low sample efficiency, difficulty in handling sparse rewards, delayed credit assignment for long-horizon tasks, sensitivity to reward function design. In this dissertation, we present a series of novel contributions that address some of the problems faced by RL, with the ultimate goal of improving its efficiency, robustness, and generalization for continuous control tasks. Specifically, we will present more robust approaches to trajectory optimization, coupled with NN function approximation for policy learning, model learning, and reward learning. In particular, the majority of this work is centered around zero-order optimization for model-predictive control, which we demonstrate to be more performing, robust, and reproducible than gradient-based trajectory optimizers. Throughout this dissertation, we will show how zero-order optimization can be used to efficiently solve tasks with sparse rewards, how it can be used in the context of imitation learning, and how it can be exploited in conjunction with model learning for uncertainty propagation. Finally, we will present a method to learn reward functions from scratch, in a purely self-supervised fashion. Through extensive experiments in simulated environments, our methods demonstrate significant improvements in learning efficiency and performance, reducing the required number of interactions with the environment while still achieving near-optimal solutions. This work aims to provide a viable approach to tackle part of the challenges of deep RL, addressing the efficiency and robustness of the learning process without relying on predefined expert knowledge. Show more

Permanent link

https://doi.org/10.3929/ethz-b-000644529

Publication status

published

External links

Search print copy at ETH Library

Contributors

Examiner: Krause, R. Andreas
Examiner: Martius, Georg
Examiner: Toussaint, Marc

Publisher

ETH Zurich

Subject

model based reinforcement learning; Reinforcement Learning; Model Predictive Control; Trajectory Optimization

Organisational unit

03908 - Krause, Andreas / Krause, Andreas

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

Sample-efficient Model-based Reinforcement Learning Mendeley CSV RIS BibTeX

Sample-efficient Model-based Reinforcement Learning

Mendeley

CSV

RIS

BibTeX