Metadata only
Date
2021-04-13Type
- Working Paper
ETH Bibliography
yes
Altmetrics
Abstract
Fitting a model into GPU memory during training is an increasing concern as models continue to grow. Parameter sharing can reduce memory requirements, but existing methods only share parameters between identical layers, limiting their impact. This paper removes these restrictions with a novel task called Neural Parameter Allocation Search (NPAS), where the goal is to generate weights for a network using a given parameter budget. NPAS requires new techniques to morph available parameters to fit any architecture. To address this new task we introduce Shapeshifter Networks (SSNs), which automatically learns where and how to share parameters between all layers in a network, even between layers of varying sizes and operations. SSNs do not require any loss function or architecture modifications, making them easy to use. We evaluate SSNs in key NPAS settings using seven network architectures across diverse tasks including image classification, bidirectional image-sentence retrieval, and phrase grounding, creating high performing models even when using as little as 1% of the parameters. Show more
Publication status
publishedExternal links
Journal / series
arXivPages / Article No.
Publisher
Cornell UniversityOrganisational unit
03950 - Hoefler, Torsten / Hoefler, Torsten
More
Show all metadata
ETH Bibliography
yes
Altmetrics