A Language Model's Guide Through Latent Space

von Rütte, Dimitri; Anagnostidis, Sotiris; Bachmann, Gregor; Hofmann, Thomas

Show simple item record

dc.contributor.author

von Rütte, Dimitri

dc.contributor.author

Anagnostidis, Sotiris

dc.contributor.author

Bachmann, Gregor

dc.contributor.author

Hofmann, Thomas

dc.contributor.editor

Salakhutdinov, Ruslan

dc.contributor.editor

Kolter, Zico

dc.contributor.editor

Heller, Katherine

dc.contributor.editor

Weller, Adrian

dc.contributor.editor

Oliver, Nuria

dc.contributor.editor

Scarlett, Jonathan

dc.contributor.editor

Berkenkamp, Felix

dc.date.accessioned

2024-12-03T09:39:27Z

dc.date.available

2024-12-03T08:09:52Z

dc.date.available

2024-12-03T09:39:27Z

dc.date.issued

2024

dc.identifier.issn

2640-3498

dc.identifier.uri

http://hdl.handle.net/20.500.11850/708816

dc.description.abstract

Concept guidance has emerged as a cheap and simple way to control the behavior of language models by probing their hidden representations for concept vectors and using them to perturb activations at inference time. While the focus of previous work has largely been on truthfulness, in this paper we extend this framework to a richer set of concepts such as appropriateness, humor, creativity and quality, and explore to what degree current detection and guidance strategies work in these challenging settings. To facilitate evaluation, we develop a novel metric for concept guidance that takes into account both the success of concept elicitation as well as the potential degradation in fluency of the guided model. Our extensive experiments reveal that while some concepts such as truthfulness more easily allow for guidance with current techniques, novel concepts such as appropriateness or humor either remain difficult to elicit, need extensive tuning to work, or even experience confusion. Moreover, we find that probes with optimal detection accuracies do not necessarily make for the optimal guides, contradicting previous observations for truthfulness. Our work warrants a deeper investigation into the interplay between detectability, guidability, and the nature of the concept, and we hope that our rich experimental test-bed for guidance research inspires stronger follow-up approaches.

en_US

dc.language.iso

en

en_US

dc.publisher

PMLR

en_US

dc.title

A Language Model's Guide Through Latent Space

en_US

dc.type

Conference Paper

ethz.book.title

Proceedings of the 41st International Conference on Machine Learning

en_US

ethz.journal.title

Proceedings of Machine Learning Research

ethz.journal.volume

235

en_US

ethz.pages.start

49655

en_US

ethz.pages.end

49687

en_US

ethz.event

41st International Conference on Machine Learning (ICML 2024)

en_US

ethz.event.location

Vienna, Austria

en_US

ethz.event.date

July 21-27, 2024

en_US

ethz.publication.place

Cambridge, MA

en_US

ethz.publication.status

published

en_US

ethz.leitzahl

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09462 - Hofmann, Thomas / Hofmann, Thomas

en_US

ethz.identifier.url

https://proceedings.mlr.press/v235/von-rutte24a.html

ethz.date.deposited

2024-12-03T08:09:52Z

ethz.source

FORM

ethz.eth

yes

en_US

ethz.availability

Metadata only

en_US

ethz.rosetta.installDate

2024-12-03T09:39:28Z

ethz.rosetta.lastUpdated

2024-12-03T09:39:28Z

ethz.rosetta.versionExported

true

ethz.COinS

ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=A%20Language%20Model's%20Guide%20Through%20Latent%20Space&rft.jtitle=Proceedings%20of%20Machine%20Learning%20Research&rft.date=2024&rft.volume=235&rft.spage=49655&rft.epage=49687&rft.issn=2640-3498&rft.au=von%20R%C3%BCtte,%20Dimitri&Anagnostidis,%20Sotiris&Bachmann,%20Gregor&Hofmann,%20Thomas&rft.genre=proceeding&rft.btitle=Proceedings%20of%20the%2041st%20International%20Conference%20on%20Machine%20Learning

Search print copy at ETH Library

Files in this item

Files	Size	Format	Open in viewer
There are no files associated with this item.

Publication type

Conference Paper [35822]

Show simple item record

Research Collection

Search

A Language Model's Guide Through Latent Space Mendeley CSV RIS BibTeX

Files in this item

Publication type

A Language Model's Guide Through Latent Space

Mendeley

CSV

RIS

BibTeX