Data-Efficient Controller Tuning and Reinforcement Learning

Fröhlich, Lukas

doi:10.3929/ethz-b-000545134

Download

Full text (PDF, 8.523Mb)

Open access

Author

Fröhlich, Lukas

Date

2022

Type

Doctoral Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 8.523Mb)

Rights / license

In Copyright - Non-Commercial Use Permitted

Abstract

Data-driven approaches to the design of control policies for robotic systems have the potential to revolutionize our world. By continuously observing the environment and changes therein, learning algorithms can adapt and improve the control policies based on the observed data. One key requirement for these approaches to work well in real-world applications is data-efficiency: how long it takes to learn a successful control policy. This is a difficult problem for machine learning, because standard algorithms often lack the required data-efficiency for real-world applications. Further, treating the problem purely from a machine learning perspective neglects decades of research and experience from the field of control theory. The goal of this thesis is to combine insights from control theory with methodologies from machine learning to develop highly data-efficient learning algorithms for continuous control problems. The first part of this thesis considers automated controller tuning, that is, finding the optimal parameters for policies based on classical approaches from control theory without human intervention. To this end, we employ Bayesian optimization (BO), a data-efficient method that addresses global, stochastic optimization problems. In its standard formulation, BO makes only few assumptions about the underlying objective function. On the one hand, this makes BO a prime candidate to tackle a wide range of applications, but on the other hand, it limits BO's data-efficiency for the specific application controller tuning. This part of the thesis presents three separate methodologies that aim at alleviating some shortcomings of BO: First, we propose to constrain the search space locally around an initial solution to enable the optimization of high-dimensional control policies whilst retaining data-efficiency. Secondly, we propose to encode environmental conditions during experiments as context variables, which allows sharing of experience from previous experiments and thus accelerate subsequent ones. Thirdly, we consider the issue of exogenous perturbations that act on the policy's parameters, and we therefore require the optimal parameters to be robust with respect to these perturbations. The efficacy of these three methodologies is demonstrated on a wide range of simulated and real-world problems. The second part of this thesis considers a more general framework for the data-driven control approach: model-based reinforcement learning (RL). This particularly data-efficient branch of RL employs a learned model---an approximation to the true environment---to simulate artificial data instead of only relying on real-world interactions. However, the learned model always remains imperfect and as such introduces an error source to the learning problem. Hence, a key challenge in model-based RL is model-bias; small errors in the learned model that can compound when simulating new data and impede the learning process. This part of the thesis presents a novel approach to alleviate the issue of model-bias. Specifically, we use the observed data as time-dependent correction terms on top of a learned model, to retain the ability to simulate new data without accumulating errors over long prediction horizons. These correction terms are inspired by a data-driven branch of control theory: iterative learning control, which we thoroughly compare to model-based RL. Further, we motivate the proposed method from a theoretical perspective and demonstrate that it can drastically improve existing model-based approaches in practice without introducing additional tuning parameters. Show more

Permanent link

https://doi.org/10.3929/ethz-b-000545134

Publication status

published

External links

Search print copy at ETH Library

Contributors

Examiner: Zeilinger, Melanie
Examiner: Hennig, Philipp
Examiner: Bürger, Mathias
Examiner: Klenske, Edgar D.

Publisher

ETH Zurich

Subject

Bayesian optimization (BO); Control engineering; Reinforcement learning (RL)

Organisational unit

09563 - Zeilinger, Melanie / Zeilinger, Melanie

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

Data-Efficient Controller Tuning and Reinforcement Learning Mendeley CSV RIS BibTeX

Data-Efficient Controller Tuning and Reinforcement Learning

Mendeley

CSV

RIS

BibTeX