Show simple item record

dc.contributor.author
von Rütte, Dimitri
dc.contributor.author
Anagnostidis, Sotiris
dc.contributor.author
Bachmann, Gregor
dc.contributor.author
Hofmann, Thomas
dc.contributor.editor
Salakhutdinov, Ruslan
dc.contributor.editor
Kolter, Zico
dc.contributor.editor
Heller, Katherine
dc.contributor.editor
Weller, Adrian
dc.contributor.editor
Oliver, Nuria
dc.contributor.editor
Scarlett, Jonathan
dc.contributor.editor
Berkenkamp, Felix
dc.date.accessioned
2024-12-03T09:39:27Z
dc.date.available
2024-12-03T08:09:52Z
dc.date.available
2024-12-03T09:39:27Z
dc.date.issued
2024
dc.identifier.issn
2640-3498
dc.identifier.uri
http://hdl.handle.net/20.500.11850/708816
dc.description.abstract
Concept guidance has emerged as a cheap and simple way to control the behavior of language models by probing their hidden representations for concept vectors and using them to perturb activations at inference time. While the focus of previous work has largely been on truthfulness, in this paper we extend this framework to a richer set of concepts such as appropriateness, humor, creativity and quality, and explore to what degree current detection and guidance strategies work in these challenging settings. To facilitate evaluation, we develop a novel metric for concept guidance that takes into account both the success of concept elicitation as well as the potential degradation in fluency of the guided model. Our extensive experiments reveal that while some concepts such as truthfulness more easily allow for guidance with current techniques, novel concepts such as appropriateness or humor either remain difficult to elicit, need extensive tuning to work, or even experience confusion. Moreover, we find that probes with optimal detection accuracies do not necessarily make for the optimal guides, contradicting previous observations for truthfulness. Our work warrants a deeper investigation into the interplay between detectability, guidability, and the nature of the concept, and we hope that our rich experimental test-bed for guidance research inspires stronger follow-up approaches.
en_US
dc.language.iso
en
en_US
dc.publisher
PMLR
en_US
dc.title
A Language Model's Guide Through Latent Space
en_US
dc.type
Conference Paper
ethz.book.title
Proceedings of the 41st International Conference on Machine Learning
en_US
ethz.journal.title
Proceedings of Machine Learning Research
ethz.journal.volume
235
en_US
ethz.pages.start
49655
en_US
ethz.pages.end
49687
en_US
ethz.event
41st International Conference on Machine Learning (ICML 2024)
en_US
ethz.event.location
Vienna, Austria
en_US
ethz.event.date
July 21-27, 2024
en_US
ethz.publication.place
Cambridge, MA
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09462 - Hofmann, Thomas / Hofmann, Thomas
en_US
ethz.identifier.url
https://proceedings.mlr.press/v235/von-rutte24a.html
ethz.date.deposited
2024-12-03T08:09:52Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2024-12-03T09:39:28Z
ethz.rosetta.lastUpdated
2024-12-03T09:39:28Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=A%20Language%20Model's%20Guide%20Through%20Latent%20Space&rft.jtitle=Proceedings%20of%20Machine%20Learning%20Research&rft.date=2024&rft.volume=235&rft.spage=49655&rft.epage=49687&rft.issn=2640-3498&rft.au=von%20R%C3%BCtte,%20Dimitri&Anagnostidis,%20Sotiris&Bachmann,%20Gregor&Hofmann,%20Thomas&rft.genre=proceeding&rft.btitle=Proceedings%20of%20the%2041st%20International%20Conference%20on%20Machine%20Learning
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record