Show simple item record

dc.contributor.author
Curi, Sebastian
dc.contributor.supervisor
Krause, Andreas
dc.contributor.supervisor
Zeilinger, Melanie
dc.contributor.supervisor
Levy, Kfir Yehuda
dc.date.accessioned
2022-11-02T13:35:30Z
dc.date.available
2022-11-02T12:55:37Z
dc.date.available
2022-11-02T13:35:30Z
dc.date.issued
2022
dc.identifier.uri
http://hdl.handle.net/20.500.11850/579085
dc.identifier.doi
10.3929/ethz-b-000579085
dc.description.abstract
Reinforcement Learning (RL) has advanced the state-of-the-art in many applications in the last decade. The root of its success stems from having access to high-quality simulators, controlled environments, and massive computing power. Nonetheless, when the goal is to apply RL algorithms to real-world problems, many challenges remain unanswered. This dissertation focuses on three of them: data efficiency, robustness, and safety. On the one hand, practical algorithms that address these issues lack theoretical guarantees. On the other hand, theoretically-sound algorithms are impractical. This thesis aims to develop algorithms that achieve the best of both worlds. Namely, we propose theoretically-sound algorithms that can be scaled using state-of-the-art neural networks and are easy to implement. We take a model-based approach and learn models distinguishing between aleatoric and epistemic uncertainty. The former is uncertainty inherent to the system, such as sensor noise. In contrast, the latter stems from data scarcity, decreasing as we collect more data and expand our knowledge about the environment. It is well-known that one needs to plan using epistemic uncertainty to achieve data-efficient exploration, robustness, and safety. Unfortunately, the algorithms that do so are impractical as they require optimizing over the set of plausible models. We reparameterize the set of plausible models to overcome this limitation. In particular, we add a hallucinating control policy that directly acts on the model's outputs and has as much authority as the epistemic uncertainty that the model affords. The reparameterization increases the action dimensions but reduces the intractable planning problem to one that standard RL algorithms can handle. We first consider the problem of data-efficient exploration. In this setting, the objective is to find an optimal policy with few interactions with the environment. A theoretical approach to solve this problem is through optimism: an agent plans a policy using the most optimistic dynamics over the set of plausible models. Unfortunately, this requires jointly optimizing policies and dynamics, which is intractable. We propose the Hallucinated Upper Confidence RL (H-UCRL) algorithm. By augmenting the input space with the hallucinated inputs, we solve H-UCRL using standard planners. Hence, H-UCRLis practical while retaining its theoretical guarantees. In particular, we show that H-UCRL attains near-optimal sample complexity guarantees, and we apply it to large-scale environments. RL agents frequently encounter situations not present during training time in real-world tasks. The RL agents must exhibit robustness against worst-case situations to ensure reliable performance. The robust RL framework addresses this challenge via a worst-case optimization between an agent and an adversary. Previous robust RL algorithms are either sample inefficient, lack robustness guarantees, or do not scale to larger problems. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to solve this problem provably. RH-UCRL combines optimism with pessimism when planning with the model to output a robust policy. Experimentally, we demonstrate that RH-UCRL outperforms other robust deep RL algorithms in various adversarial environments. Finally, we address the problem of constraint satisfaction in RL. This challenge is crucial for the safe deployment of RL agents in real-world environments. We develop confidence-based safety filters, a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard RL techniques. We reformulate state constraints in terms of cost functions to reduce safety verification to a standard RL task. The central idea of the safety filter is to filter the actions of the policy to ensure constraint satisfaction. The safety filter executes a backup policy when we cannot verify that the constraints are satisfied. This backup policy is assumed in most works, but we leverage the hallucinating inputs and learn the backup policy by solving a robust RL problem. We provide formal safety guarantees for the safety filter and empirically demonstrate the effectiveness of our approach.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.subject
Reinforcement Learning
en_US
dc.subject
Deep Learning
en_US
dc.subject
Learning control
en_US
dc.subject
Robustness
en_US
dc.subject
Exploration
en_US
dc.title
Epistemic Uncertainty for Practical Deep Model-Based Reinforcement Learning
en_US
dc.type
Doctoral Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2022-11-02
ethz.size
150 p.
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::000 - Generalities, science
en_US
ethz.grant
Reliable Data-Driven Decision Making in Cyber-Physical Systems
en_US
ethz.identifier.diss
28649
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::03908 - Krause, Andreas / Krause, Andreas
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::03908 - Krause, Andreas / Krause, Andreas
en_US
ethz.grant.agreementno
815943
ethz.grant.fundername
EC
ethz.grant.funderDoi
10.13039/501100000780
ethz.grant.program
H2020
ethz.date.deposited
2022-11-02T12:55:38Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2022-11-02T13:35:31Z
ethz.rosetta.lastUpdated
2023-02-07T07:29:18Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Epistemic%20Uncertainty%20for%20Practical%20Deep%20Model-Based%20Reinforcement%20Learning&rft.date=2022&rft.au=Curi,%20Sebastian&rft.genre=unknown&rft.btitle=Epistemic%20Uncertainty%20for%20Practical%20Deep%20Model-Based%20Reinforcement%20Learning
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record