Generalizing Backpropagation for Gradient-Based Interpretability

Du, Kevin; Torroba Hennigen, Lucas; Stöhr, Niklas Werner; Warstadt, Alex; Cotterell, Ryan

doi:10.18653/v1/2023.acl-long.669

Download

Full text (published version) (PDF, 563.3Kb)

Open access

Author

Du, Kevin

Torroba Hennigen, Lucas

Date

2023-07

Type

Conference Paper

ETH Bibliography

yes

Altmetrics

Download

Full text (published version) (PDF, 563.3Kb)

Rights / license

Creative Commons Attribution 4.0 International

Abstract

Many popular feature-attribution methods for interpreting deep neural networks rely on computing the gradients of a model’s output with respect to its inputs. While these methods can indicate which input features may be important for the model’s prediction, they reveal little about the inner workin Show more

Permanent link

https://doi.org/10.3929/ethz-b-000650665

Publication status

published

External links

https://doi.org/10.18653/v1/2023.acl-long.669

Editor

Rogers, Anna

Boyd-Graber, Jordan

Okazaki, Naoaki

Book title

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)