Open access
Date
2021-07-23Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
Deep ensembles aggregate predictions of diverse neural networks to improve generalisation and quantify uncertainty. Here, we investigate their behavior when increasing the ensemble members’ parameter size - a practice typically associated with better performance for single models. We show that under practical assumptions in the overparametrized regime far into the double descent curve, not only the ensemble test loss degrades, but common out-of-distribution detection and calibration metrics suffer as well. Reminiscent to deep double descent, we observe this phenomenon not only when increasing the single member’s capacity but also as we increase the training budget, suggesting deep ensembles can benefit from early stopping. This sheds light on the success and failure modes of deep ensembles and suggests that averaging finite width models perform better than the neural tangent kernel limit for these metrics. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000501624Publication status
publishedPublisher
International Conference on Machine LearningEvent
Organisational unit
02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.09479 - Grewe, Benjamin / Grewe, Benjamin
Related publications and datasets
Notes
Conference lecture held at the poster session 1 on July 23, 2021.More
Show all metadata
ETH Bibliography
yes
Altmetrics