Locally Typical Sampling
dc.contributor.author
Meister, Clara Isabel
dc.contributor.author
Pimentel, Tiago
dc.contributor.author
Wiher, Gian
dc.contributor.author
Cotterell, Ryan
dc.date.accessioned
2023-02-15T15:01:13Z
dc.date.available
2023-02-04T04:30:54Z
dc.date.available
2023-02-15T15:01:13Z
dc.date.issued
2023-01-12
dc.identifier.issn
2307-387X
dc.identifier.other
10.1162/tacl_a_00536
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/597055
dc.identifier.doi
10.3929/ethz-b-000597055
dc.description.abstract
Today’s probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics (e.g., perplexity). This discrepancy has puzzled the language generation community for the last few years. In this work, we posit that the abstraction of natural language generation as a discrete stochastic process—which allows for an information-theoretic analysis—can provide new insights into the behavior of probabilistic language generators, for example, why high-probability texts can be dull or repetitive. Humans use language as a means of communicating information, aiming to do so in a simultaneously efficient and error-minimizing manner; in fact, psycholinguistics research suggests humans choose each word in a string with this subconscious goal in mind. We formally define the set of strings that meet this criterion: Those for which each word has an information content close to the expected information content, namely, the conditional entropy of our model. We then propose a simple and efficient procedure for enforcing this criterion when generating from probabilistic models, which we call locally typical sampling. Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, locally typical sampling offers competitive performance (in both abstractive summarization and story generation) in terms of quality while consistently reducing degenerate repetitions.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Association for Computational Linguistics
en_US
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
dc.title
Locally Typical Sampling
en_US
dc.type
Journal Article
dc.rights.license
Creative Commons Attribution 4.0 International
ethz.journal.title
Transactions of the Association for Computational Linguistics
ethz.journal.volume
11
en_US
ethz.pages.start
102
en_US
ethz.pages.end
121
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.identifier.wos
ethz.identifier.scopus
ethz.publication.place
Cambridge, MA
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09682 - Cotterell, Ryan / Cotterell, Ryan
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09682 - Cotterell, Ryan / Cotterell, Ryan
ethz.relation.isNewVersionOf
20.500.11850/588594
ethz.date.deposited
2023-02-04T04:30:55Z
ethz.source
SCOPUS
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2023-02-15T15:01:16Z
ethz.rosetta.lastUpdated
2024-02-02T19:42:21Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Locally%20Typical%20Sampling&rft.jtitle=Transactions%20of%20the%20Association%20for%20Computational%20Linguistics&rft.date=2023-01-12&rft.volume=11&rft.spage=102&rft.epage=121&rft.issn=2307-387X&rft.au=Meister,%20Clara%20Isabel&Pimentel,%20Tiago&Wiher,%20Gian&Cotterell,%20Ryan&rft.genre=article&rft_id=info:doi/10.1162/tacl_a_00536&
Files in this item
Publication type
-
Journal Article [131759]