Language Model Quality Correlates with Psychometric Predictive Power in Multiple Languages
Open access
Date
2023-12Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
Surprisal theory (Hale, 2001; Levy, 2008) posits that a word’s reading time is proportional to its surprisal (i.e., to its negative log probability given the proceeding context). Since we are unable to access a word’s ground-truth probability, surprisal theory has been empirically tested using surprisal estimates from language models (LMs). Under the premise that surprisal theory holds, we would expect that higher quality language models provide more powerful predictors of human reading behavior—a conjecture we dub the quality–power (QP) hypothesis. Unfortunately, empirical support for the QP hypothesis is mixed. Some studies in English have found correlations between LM quality and predictive power, but other studies using Japanese data, as well as using larger English LMs, find no such correlations. In this work, we conduct a systematic crosslinguistic assessment of the QP hypothesis. We train LMs from scratch on small- and medium-sized datasets from 13 languages (across five language families) and assess their ability to predict eye tracking data. We find correlations between LM quality and power in eleven of these thirteen languages, suggesting that, within the range of model classes and sizes tested, better language models are indeed better predictors of human language processing behaviors. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000650659Publication status
publishedExternal links
Book title
Proceedings of the 2023 Conference on Empirical Methods in Natural Language ProcessingPages / Article No.
Publisher
Association for Computational LinguisticsEvent
Organisational unit
09682 - Cotterell, Ryan / Cotterell, Ryan
09462 - Hofmann, Thomas / Hofmann, Thomas
Related publications and datasets
Is supplemented by: https://github.com/rycolab/quality-power-hypothesis
More
Show all metadata
ETH Bibliography
yes
Altmetrics