The curse of dimensionality and gradient-based training of neural networks: shrinking the gap between theory and applications

Rossmannek, Florian

doi:10.3929/ethz-b-000602160

Download

Full text (PDF, 1.354Mb)

Open access

Autor(in)

Rossmannek, Florian

Datum

2023

Typ

Doctoral Thesis

ETH Bibliographie

yes

Altmetrics

Download

Full text (PDF, 1.354Mb)

Rechte / Lizenz

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Abstract

Neural networks have gained widespread attention due to their remarkable performance in various applications. Two aspects are particular striking: on the one hand, neural networks seem to enjoy superior approximation capacities than classical methods. On the other hand, neural networks are trained successfully with gradient-based algorithms despite the training task being a highly nonconvex optimization problem. This thesis advances the theory behind these two phenomena. On the aspect of approximation, we develop a framework for showing that neural networks can break the so-called curse of dimensionality in different high-dimensional approximation problems, meaning that the complexity of the neural networks involved scales at most polynomially in the dimension. Our approach is based on the notion of a catalog network, which is a generalization of a feed-forward neural network in which the nonlinear activation functions can vary from layer to layer as long as they are chosen from a predefined catalog of functions. As such, catalog networks constitute a rich family of continuous functions. We show that, under appropriate conditions on the catalog, these catalog networks can efficiently be approximated with rectified linear unit (ReLU)-type networks and provide precise estimates of the number of parameters needed for a given approximation accuracy. As special cases of the general results, we obtain different classes of functions that can be approximated with ReLU networks without the curse of dimensionality. On the aspect of optimization, we investigate the interplay between neural networks and gradient-based training algorithms by studying the loss surface. On the one hand, we discover an obstruction to successful learning due to an unfortunate interplay between the architecture of the network and the initialization of the algorithm. More precisely, we demonstrate that stochastic gradient descent fails to converge for ReLU networks if their depth is much larger than their width and the number of random initializations does not increase to infinity fast enough. On the other hand, we establish positive results by conducting a landscape analysis and applying dynamical systems theory. These positive results deal with the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation. In all three cases, we provide a complete classification of the critical points in the case where the target function is affine and one-dimensional. Next, we prove a new variant of a dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements usually imposed. We verify that ReLU networks with one hidden layer fit into the new framework. Building on our classification of critical points, we deduce that gradient descent avoids most saddle points. We proceed to prove convergence to global minima if the initialization is sufficiently good, which is expressed by an explicit threshold on the limiting loss. Mehr anzeigen

Persistenter Link

https://doi.org/10.3929/ethz-b-000602160

Publikationsstatus

published

Externe Links

Printexemplar via ETH-Bibliothek suchen

Beteiligte

Referent: Cheridito, Patrick
Referent: Jentzen, Arnulf

Verlag

ETH Zurich

Organisationseinheit

09557 - Cheridito, Patrick / Cheridito, Patrick
02204 - RiskLab / RiskLab

Förderung

175699 - Higher order numerical approximation methods for stochastic partial differential equations (SNF)

Mehr

Alle Metadaten anzeigen

ETH Bibliographie

yes

Altmetrics

Research Collection

Suche

The curse of dimensionality and gradient-based training of neural networks: shrinking the gap between theory and applications Mendeley CSV RIS BibTeX

The curse of dimensionality and gradient-based training of neural networks: shrinking the gap between theory and applications

Mendeley

CSV

RIS

BibTeX