Epistemic Uncertainty for Practical Deep Model-Based Reinforcement Learning

Curi, Sebastian

doi:10.3929/ethz-b-000579085

Show simple item record

dc.contributor.author

Curi, Sebastian

dc.contributor.supervisor

Krause, Andreas

dc.contributor.supervisor

Zeilinger, Melanie

dc.contributor.supervisor

Levy, Kfir Yehuda

dc.date.accessioned

2022-11-02T13:35:30Z

dc.date.available

2022-11-02T12:55:37Z

dc.date.available

2022-11-02T13:35:30Z

dc.date.issued

2022

dc.identifier.uri

http://hdl.handle.net/20.500.11850/579085

dc.identifier.doi

10.3929/ethz-b-000579085

dc.description.abstract

Reinforcement Learning (RL) has advanced the state-of-the-art in many applications in the last decade. The root of its success stems from having access to high-quality simulators, controlled environments, and massive computing power. Nonetheless, when the goal is to apply RL algorithms to real-world problems, many challenges remain unanswered. This dissertation focuses on three of them: data efficiency, robustness, and safety. On the one hand, practical algorithms that address these issues lack theoretical guarantees. On the other hand, theoretically-sound algorithms are impractical. This thesis aims to develop algorithms that achieve the best of both worlds. Namely, we propose theoretically-sound algorithms that can be scaled using state-of-the-art neural networks and are easy to implement. We take a model-based approach and learn models distinguishing between aleatoric and epistemic uncertainty. The former is uncertainty inherent to the system, such as sensor noise. In contrast, the latter stems from data scarcity, decreasing as we collect more data and expand our knowledge about the environment. It is well-known that one needs to plan using epistemic uncertainty to achieve data-efficient exploration, robustness, and safety. Unfortunately, the algorithms that do so are impractical as they require optimizing over the set of plausible models. We reparameterize the set of plausible models to overcome this limitation. In particular, we add a hallucinating control policy that directly acts on the model's outputs and has as much authority as the epistemic uncertainty that the model affords. The reparameterization increases the action dimensions but reduces the intractable planning problem to one that standard RL algorithms can handle. We first consider the problem of data-efficient exploration. In this setting, the objective is to find an optimal policy with few interactions with the environment. A theoretical approach to solve this problem is through optimism: an agent plans a policy using the most optimistic dynamics over the set of plausible models. Unfortunately, this requires jointly optimizing policies and dynamics, which is intractable. We propose the Hallucinated Upper Confidence RL (H-UCRL) algorithm. By augmenting the input space with the hallucinated inputs, we solve H-UCRL using standard planners. Hence, H-UCRLis practical while retaining its theoretical guarantees. In particular, we show that H-UCRL attains near-optimal sample complexity guarantees, and we apply it to large-scale environments. RL agents frequently encounter situations not present during training time in real-world tasks. The RL agents must exhibit robustness against worst-case situations to ensure reliable performance. The robust RL framework addresses this challenge via a worst-case optimization between an agent and an adversary. Previous robust RL algorithms are either sample inefficient, lack robustness guarantees, or do not scale to larger problems. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to solve this problem provably. RH-UCRL combines optimism with pessimism when planning with the model to output a robust policy. Experimentally, we demonstrate that RH-UCRL outperforms other robust deep RL algorithms in various adversarial environments. Finally, we address the problem of constraint satisfaction in RL. This challenge is crucial for the safe deployment of RL agents in real-world environments. We develop confidence-based safety filters, a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard RL techniques. We reformulate state constraints in terms of cost functions to reduce safety verification to a standard RL task. The central idea of the safety filter is to filter the actions of the policy to ensure constraint satisfaction. The safety filter executes a backup policy when we cannot verify that the constraints are satisfied. This backup policy is assumed in most works, but we leverage the hallucinating inputs and learn the backup policy by solving a robust RL problem. We provide formal safety guarantees for the safety filter and empirically demonstrate the effectiveness of our approach.

en_US

dc.format

application/pdf

en_US

dc.language.iso

en

en_US

dc.publisher

ETH Zurich

en_US

dc.rights.uri

http://rightsstatements.org/page/InC-NC/1.0/

dc.subject

Reinforcement Learning

en_US

dc.subject

Deep Learning

en_US

dc.subject

Learning control

en_US

dc.subject

Robustness

en_US

dc.subject

Exploration

en_US

dc.title

Epistemic Uncertainty for Practical Deep Model-Based Reinforcement Learning

en_US

dc.type

Doctoral Thesis

dc.rights.license

In Copyright - Non-Commercial Use Permitted

dc.date.published

2022-11-02

ethz.size

150 p.

en_US

ethz.code.ddc

DDC - DDC::0 - Computer science, information & general works::000 - Generalities, science

en_US

ethz.grant

Reliable Data-Driven Decision Making in Cyber-Physical Systems

en_US

ethz.identifier.diss

28649

en_US

ethz.publication.place

Zurich

en_US

ethz.publication.status

published

en_US

ethz.leitzahl

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::03908 - Krause, Andreas / Krause, Andreas

en_US

ethz.leitzahl.certified

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::03908 - Krause, Andreas / Krause, Andreas

en_US

ethz.grant.agreementno

815943

ethz.grant.fundername

EC

ethz.grant.funderDoi

10.13039/501100000780

ethz.grant.program

H2020

ethz.date.deposited

2022-11-02T12:55:38Z

ethz.source

FORM

ethz.eth

yes

en_US

ethz.availability

Open access

en_US

ethz.rosetta.installDate

2022-11-02T13:35:31Z

ethz.rosetta.lastUpdated

2023-02-07T07:29:18Z

ethz.rosetta.versionExported

true

ethz.COinS

ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Epistemic%20Uncertainty%20for%20Practical%20Deep%20Model-Based%20Reinforcement%20Learning&rft.date=2022&rft.au=Curi,%20Sebastian&rft.genre=unknown&rft.btitle=Epistemic%20Uncertainty%20for%20Practical%20Deep%20Model-Based%20Reinforcement%20Learning

Search print copy at ETH Library

Files in this item

Name:: Ph__D__Thesis__Curi.pdf
Size:: 1.516Mb
Format:: Adobe PDF
Label:: Full text

Download

Publication type

Doctoral Thesis [30031]

Show simple item record

Research Collection

Search

Epistemic Uncertainty for Practical Deep Model-Based Reinforcement Learning Mendeley CSV RIS BibTeX

Files in this item

Publication type

Epistemic Uncertainty for Practical Deep Model-Based Reinforcement Learning

Mendeley

CSV

RIS

BibTeX