Information-Theoretic Probing for Linguistic Structure

Pimentel, Tiago; Valvoda, Josef; Maudslay, Rowan H.; Zmigrod, Ran; Williams, Adina; Cotterell, Ryan

doi:10.3929/ethz-b-000446005

Show simple item record

dc.contributor.author

Pimentel, Tiago

dc.contributor.author

Valvoda, Josef

dc.contributor.author

Maudslay, Rowan H.

dc.contributor.author

Zmigrod, Ran

dc.contributor.author

Williams, Adina

dc.contributor.author

Cotterell, Ryan

dc.contributor.editor

Jurafsky, Dan

dc.contributor.editor

Chai, Joyce

dc.contributor.editor

Schluter, Natalie

dc.contributor.editor

Tetreault, Joel

dc.date.accessioned

2021-12-07T10:08:26Z

dc.date.available

2020-10-15T02:38:46Z

dc.date.available

2020-10-29T12:10:11Z

dc.date.available

2020-10-29T12:25:07Z

dc.date.available

2020-10-29T12:42:28Z

dc.date.available

2021-12-07T10:08:26Z

dc.date.issued

2020-07

dc.identifier.isbn

978-1-952148-25-5

en_US

dc.identifier.uri

http://hdl.handle.net/20.500.11850/446005

dc.identifier.doi

10.3929/ethz-b-000446005

dc.description.abstract

The success of neural networks on a diverse set of NLP tasks has led researchers to question how much these networks actually "know" about natural language. Probes are a natural way of assessing this. When probing, a researcher chooses a linguistic task and trains a supervised model to predict annotations in that linguistic task from the network's learned representations. If the probe does well, the researcher may conclude that the representations encode knowledge related to the task. A commonly held belief is that using simpler models as probes is better; the logic is that simpler models will identify linguistic structure, but not learn the task itself. We propose an information-theoretic operationalization of probing as estimating mutual information that contradicts this received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation. The experimental portion of our paper focuses on empirically estimating the mutual information between a linguistic property and BERT, comparing these estimates to several baselines. We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research-plus English-totalling eleven languages. Our implementation is available in https://github.com/rycolab/info-theoretic-probing.

en_US

dc.format

application/pdf

en_US

dc.language.iso

en

en_US

dc.publisher

Association for Computational Linguistics

en_US

dc.rights.uri

http://creativecommons.org/licenses/by/4.0/

dc.title

Information-Theoretic Probing for Linguistic Structure

en_US

dc.type

Conference Paper

dc.rights.license

Creative Commons Attribution 4.0 International

ethz.book.title

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

en_US

ethz.pages.start

4609

en_US

ethz.pages.end

4622

en_US

ethz.version.deposit

publishedVersion

en_US

ethz.event

58th Annual Meeting of the Association-for-Computational-Linguistics (ACL 2020) (virtual)

ethz.event.location

Online

ethz.event.date

July 5-10, 2020

ethz.notes

Due to the Coronavirus (COVID-19) the conference was conducted virtually.

en_US

ethz.identifier.wos

000570978204079

ethz.publication.place

Stroudsburg, PA

ethz.publication.status

published

en_US

ethz.leitzahl

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09682 - Cotterell, Ryan / Cotterell, Ryan

en_US

ethz.leitzahl.certified

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09682 - Cotterell, Ryan / Cotterell, Ryan

ethz.identifier.url

https://aclanthology.org/2020.acl-main.420

ethz.date.deposited

2020-10-15T02:38:58Z

ethz.source

WOS

ethz.eth

yes

en_US

ethz.availability

Open access

en_US

ethz.rosetta.installDate

2020-10-29T12:10:22Z

ethz.rosetta.lastUpdated

2024-02-02T15:31:09Z

ethz.rosetta.versionExported

true

ethz.COinS

ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Information-Theoretic%20Probing%20for%20Linguistic%20Structure&rft.date=2020-07&rft.spage=4609&rft.epage=4622&rft.au=Pimentel,%20Tiago&Valvoda,%20Josef&Maudslay,%20Rowan%20H.&Zmigrod,%20Ran&Williams,%20Adina&rft.isbn=978-1-952148-25-5&rft.genre=proceeding&rft.btitle=Proceedings%20of%20the%2058th%20Annual%20Meeting%20of%20the%20Association%20for%20Computational%20Linguistics

Search print copy at ETH Library

Files in this item

Name:: 2020.acl-main.420.pdf
Size:: 421.1Kb
Format:: Adobe PDF
Label:: Full text (published version)

Download

Publication type

Conference Paper [35391]

Show simple item record

Research Collection

Search

Information-Theoretic Probing for Linguistic Structure Mendeley CSV RIS BibTeX

Files in this item

Publication type

Information-Theoretic Probing for Linguistic Structure

Mendeley

CSV

RIS

BibTeX