The curse of dimensionality and gradient-based training of neural networks: shrinking the gap between theory and applications

Rossmannek, Florian

doi:10.3929/ethz-b-000602160

Zur Kurzanzeige

dc.contributor.author

Rossmannek, Florian

dc.contributor.supervisor

Cheridito, Patrick

dc.contributor.supervisor

Jentzen, Arnulf

dc.date.accessioned

2024-06-14T13:21:59Z

dc.date.available

2023-03-07T16:09:05Z

dc.date.available

2023-03-08T09:29:36Z

dc.date.available

2024-06-14T13:21:59Z

dc.date.issued

2023

dc.identifier.uri

http://hdl.handle.net/20.500.11850/602160

dc.identifier.doi

10.3929/ethz-b-000602160

dc.description.abstract

Neural networks have gained widespread attention due to their remarkable performance in various applications. Two aspects are particular striking: on the one hand, neural networks seem to enjoy superior approximation capacities than classical methods. On the other hand, neural networks are trained successfully with gradient-based algorithms despite the training task being a highly nonconvex optimization problem. This thesis advances the theory behind these two phenomena. On the aspect of approximation, we develop a framework for showing that neural networks can break the so-called curse of dimensionality in different high-dimensional approximation problems, meaning that the complexity of the neural networks involved scales at most polynomially in the dimension. Our approach is based on the notion of a catalog network, which is a generalization of a feed-forward neural network in which the nonlinear activation functions can vary from layer to layer as long as they are chosen from a predefined catalog of functions. As such, catalog networks constitute a rich family of continuous functions. We show that, under appropriate conditions on the catalog, these catalog networks can efficiently be approximated with rectified linear unit (ReLU)-type networks and provide precise estimates of the number of parameters needed for a given approximation accuracy. As special cases of the general results, we obtain different classes of functions that can be approximated with ReLU networks without the curse of dimensionality. On the aspect of optimization, we investigate the interplay between neural networks and gradient-based training algorithms by studying the loss surface. On the one hand, we discover an obstruction to successful learning due to an unfortunate interplay between the architecture of the network and the initialization of the algorithm. More precisely, we demonstrate that stochastic gradient descent fails to converge for ReLU networks if their depth is much larger than their width and the number of random initializations does not increase to infinity fast enough. On the other hand, we establish positive results by conducting a landscape analysis and applying dynamical systems theory. These positive results deal with the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation. In all three cases, we provide a complete classification of the critical points in the case where the target function is affine and one-dimensional. Next, we prove a new variant of a dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements usually imposed. We verify that ReLU networks with one hidden layer fit into the new framework. Building on our classification of critical points, we deduce that gradient descent avoids most saddle points. We proceed to prove convergence to global minima if the initialization is sufficiently good, which is expressed by an explicit threshold on the limiting loss.

en_US

dc.format

application/pdf

en_US

dc.language.iso

en

en_US

dc.publisher

ETH Zurich

en_US

dc.rights.uri

http://creativecommons.org/licenses/by-nc-nd/4.0/

dc.title

The curse of dimensionality and gradient-based training of neural networks: shrinking the gap between theory and applications

en_US

dc.type

Doctoral Thesis

dc.rights.license

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

dc.date.published

2023-03-07

ethz.size

122 p.

en_US

ethz.code.ddc

DDC - DDC::5 - Science::510 - Mathematics

en_US

ethz.grant

Higher order numerical approximation methods for stochastic partial differential equations

en_US

ethz.identifier.diss

29083

en_US

ethz.publication.place

Zurich

en_US

ethz.publication.status

published

en_US

ethz.leitzahl

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02000 - Dep. Mathematik / Dep. of Mathematics::02003 - Mathematik Selbständige Professuren::09557 - Cheridito, Patrick / Cheridito, Patrick

en_US

ethz.leitzahl

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02000 - Dep. Mathematik / Dep. of Mathematics::02204 - RiskLab / RiskLab

en_US

ethz.leitzahl.certified

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02000 - Dep. Mathematik / Dep. of Mathematics::02003 - Mathematik Selbständige Professuren::09557 - Cheridito, Patrick / Cheridito, Patrick

en_US

ethz.leitzahl.certified

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02000 - Dep. Mathematik / Dep. of Mathematics::02204 - RiskLab / RiskLab

ethz.grant.agreementno

175699

ethz.grant.fundername

SNF

ethz.grant.funderDoi

10.13039/501100001711

ethz.grant.program

Projekte MINT

ethz.date.deposited

2023-03-07T16:09:06Z

ethz.source

FORM

ethz.eth

yes

en_US

ethz.availability

Open access

en_US

ethz.rosetta.installDate

2023-03-08T09:29:37Z

ethz.rosetta.lastUpdated

2024-02-02T20:47:52Z

ethz.rosetta.exportRequired

true

ethz.rosetta.versionExported

true

ethz.COinS

ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=The%20curse%20of%20dimensionality%20and%20gradient-based%20training%20of%20neural%20networks:%20shrinking%20the%20gap%20between%20theory%20and%20applications&rft.date=2023&rft.au=Rossmannek,%20Florian&rft.genre=unknown&rft.btitle=The%20curse%20of%20dimensionality%20and%20gradient-based%20training%20of%20neural%20networks:%20shrinking%20the%20gap%20between%20theory%20and%20applications

Printexemplar via ETH-Bibliothek suchen

Dateien zu diesem Eintrag

Name:: Diss_ETH_No_29083_Rossmannek.pdf
Größe:: 1.354Mb
Format:: Adobe PDF
Label:: Full text

Download

Publikationstyp

Doctoral Thesis [30307]

Zur Kurzanzeige

Research Collection

Suche

The curse of dimensionality and gradient-based training of neural networks: shrinking the gap between theory and applications Mendeley CSV RIS BibTeX

Dateien zu diesem Eintrag

Publikationstyp

The curse of dimensionality and gradient-based training of neural networks: shrinking the gap between theory and applications

Mendeley

CSV

RIS

BibTeX