Metadata only
Autor(in)
Alle anzeigen
Datum
2021-04-13Typ
- Working Paper
ETH Bibliographie
yes
Altmetrics
Abstract
Fitting a model into GPU memory during training is an increasing concern as models continue to grow. Parameter sharing can reduce memory requirements, but existing methods only share parameters between identical layers, limiting their impact. This paper removes these restrictions with a novel task called Neural Parameter Allocation Search (NPAS), where the goal is to generate weights for a network using a given parameter budget. NPAS requires new techniques to morph available parameters to fit any architecture. To address this new task we introduce Shapeshifter Networks (SSNs), which automatically learns where and how to share parameters between all layers in a network, even between layers of varying sizes and operations. SSNs do not require any loss function or architecture modifications, making them easy to use. We evaluate SSNs in key NPAS settings using seven network architectures across diverse tasks including image classification, bidirectional image-sentence retrieval, and phrase grounding, creating high performing models even when using as little as 1% of the parameters. Mehr anzeigen
Publikationsstatus
publishedExterne Links
Zeitschrift / Serie
arXivSeiten / Artikelnummer
Verlag
Cornell UniversityOrganisationseinheit
03950 - Hoefler, Torsten / Hoefler, Torsten
ETH Bibliographie
yes
Altmetrics