首页|Truth-value judgment in language models: 'truth directions' are context sensitive

Truth-value judgment in language models: 'truth directions' are context sensitive

来源：

英文摘要

Recent work has demonstrated that the latent spaces of large language models (LLMs) contain directions predictive of the truth of sentences. Multiple methods recover such directions and build probes that are described as uncovering a model's "knowledge" or "beliefs". We investigate this phenomenon, looking closely at the impact of context on the probes. Our experiments establish where in the LLM the probe's predictions are (most) sensitive to the presence of related sentences, and how to best characterize this kind of sensitivity. We do so by measuring different types of consistency errors that occur after probing an LLM whose inputs consist of hypotheses preceded by (negated) supporting and contradicting sentences. We also perform a causal intervention experiment, investigating whether moving the representation of a premise along these truth-value directions influences the position of an entailed or contradicted sentence along that same direction. We find that the probes we test are generally context sensitive, but that contexts which should not affect the truth often still impact the probe outputs. Our experiments show that the type of errors depend on the layer, the model, and the kind of data. Finally, our results suggest that truth-value directions are causal mediators in the inference process that incorporates in-context information.

作者：Stefan F. Schouten、Peter Bloem、Ilia Markov、Piek Vossen

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Stefan F. Schouten,Peter Bloem,Ilia Markov,Piek Vossen.Truth-value judgment in language models: 'truth directions' are context sensitive[EB/OL].(2025-07-11)[2025-07-21].https://arxiv.org/abs/2404.18865.点此复制

Truth-value judgment in language models: 'truth directions' are context sensitive

Truth-value judgment in language models: 'truth directions' are context sensitive

评论