The curse of dimensionality and gradient-based training of neural networks: shrinking the gap between theory and applications
dc.contributor.author
Rossmannek, Florian
dc.contributor.supervisor
Cheridito, Patrick
dc.contributor.supervisor
Jentzen, Arnulf
dc.date.accessioned
2024-06-14T13:21:59Z
dc.date.available
2023-03-07T16:09:05Z
dc.date.available
2023-03-08T09:29:36Z
dc.date.available
2024-06-14T13:21:59Z
dc.date.issued
2023
dc.identifier.uri
http://hdl.handle.net/20.500.11850/602160
dc.identifier.doi
10.3929/ethz-b-000602160
dc.description.abstract
Neural networks have gained widespread attention due to their remarkable performance in various applications. Two aspects are particular striking: on the one hand, neural networks seem to enjoy superior approximation capacities than classical methods. On the other hand, neural networks are trained successfully with gradient-based algorithms despite the training task being a highly nonconvex optimization problem. This thesis advances the theory behind these two phenomena.
On the aspect of approximation, we develop a framework for showing that neural networks can break the so-called curse of dimensionality in different high-dimensional approximation problems, meaning that the complexity of the neural networks involved scales at most polynomially in the dimension. Our approach is based on the notion of a catalog network, which is a generalization of a feed-forward neural network in which the nonlinear activation functions can vary from layer to layer as long as they are chosen from a predefined catalog of functions. As such, catalog networks constitute a rich family of continuous functions. We show that, under appropriate conditions on the catalog, these catalog networks can efficiently be approximated with rectified linear unit (ReLU)-type networks and provide precise estimates of the number of parameters needed for a given approximation accuracy. As special cases of the general results, we obtain different classes of functions that can be approximated with ReLU networks without the curse of dimensionality.
On the aspect of optimization, we investigate the interplay between neural networks and gradient-based training algorithms by studying the loss surface. On the one hand, we discover an obstruction to successful learning due to an unfortunate interplay between the architecture of the network and the initialization of the algorithm. More precisely, we demonstrate that stochastic gradient descent fails to converge for ReLU networks if their depth is much larger than their width and the number of random initializations does not increase to infinity fast enough. On the other hand, we establish positive results by conducting a landscape analysis and applying dynamical systems theory. These positive results deal with the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation. In all three cases, we provide a complete classification of the critical points in the case where the target function is affine and one-dimensional. Next, we prove a new variant of a dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements usually imposed. We verify that ReLU networks with one hidden layer fit into the new framework. Building on our classification of critical points, we deduce that gradient descent avoids most saddle points. We proceed to prove convergence to global minima if the initialization is sufficiently good, which is expressed by an explicit threshold on the limiting loss.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title
The curse of dimensionality and gradient-based training of neural networks: shrinking the gap between theory and applications
en_US
dc.type
Doctoral Thesis
dc.rights.license
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
dc.date.published
2023-03-07
ethz.size
122 p.
en_US
ethz.code.ddc
DDC - DDC::5 - Science::510 - Mathematics
en_US
ethz.grant
Higher order numerical approximation methods for stochastic partial differential equations
en_US
ethz.identifier.diss
29083
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02000 - Dep. Mathematik / Dep. of Mathematics::02003 - Mathematik Selbständige Professuren::09557 - Cheridito, Patrick / Cheridito, Patrick
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02000 - Dep. Mathematik / Dep. of Mathematics::02204 - RiskLab / RiskLab
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02000 - Dep. Mathematik / Dep. of Mathematics::02003 - Mathematik Selbständige Professuren::09557 - Cheridito, Patrick / Cheridito, Patrick
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02000 - Dep. Mathematik / Dep. of Mathematics::02204 - RiskLab / RiskLab
ethz.grant.agreementno
175699
ethz.grant.fundername
SNF
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.program
Projekte MINT
ethz.date.deposited
2023-03-07T16:09:06Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2023-03-08T09:29:37Z
ethz.rosetta.lastUpdated
2024-02-02T20:47:52Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=The%20curse%20of%20dimensionality%20and%20gradient-based%20training%20of%20neural%20networks:%20shrinking%20the%20gap%20between%20theory%20and%20applications&rft.date=2023&rft.au=Rossmannek,%20Florian&rft.genre=unknown&rft.btitle=The%20curse%20of%20dimensionality%20and%20gradient-based%20training%20of%20neural%20networks:%20shrinking%20the%20gap%20between%20theory%20and%20applications
Dateien zu diesem Eintrag
Publikationstyp
-
Doctoral Thesis [30307]