RECKONING: Bi-Level Learning for Robust, Distractor-Resistant In-Context Reasoning in LLMs

Table of Links

Abstract

Recent studies on transformer-based language models show that they can answer questions by reasoning over knowledge provided as part of the context (i.e., incontext reasoning). However, since the available knowledge is often not filtered for a particular question, in-context reasoning can be sensitive to distractor facts, additional content that is irrelevant to a question but that may be relevant for a different question (i.e., not necessarily random noise). In these situations, the model fails to distinguish the necessary knowledge to answer the question, leading to spurious reasoning and degraded performance. This reasoning failure contrasts with the model’s apparent ability to distinguish its contextual knowledge from all the knowledge it has memorized during pre-training. Following this observation, we propose teaching the model to reason more robustly by folding the provided contextual knowledge into the model’s parameters before presenting it with a question. Our method, RECKONING, is a bi-level learning algorithm that teaches language models to reason by updating their parametric knowledge through back-propagation, allowing them to answer questions using the updated parameters. During training, the inner loop rapidly adapts a copy of the model weights to encode contextual knowledge into its parameters. In the outer loop, the model learns to use the updated weights to reproduce and answer reasoning questions about the memorized knowledge. Our experiments on three diverse multi-hop reasoning datasets show that RECKONING’s performance improves over the in-context reasoning baseline (by up to 4.5%). We also find that compared to in-context reasoning, RECKONING generalizes better to longer reasoning chains unseen during training, is more robust to distractors in the context, and is computationally more efficient when multiple questions are asked about the same knowledge.

1 Introduction

Consider the sentence: “John is David’s dad, and Tom is John’s dad”. Concluding that Tom is David’s grandfather involves reasoning about the information in the sentence. Specifically, it requires understanding the direct information, or contextual knowledge, given in the sentence: the stated relationships between John, David, and Tom; and combining it with our existing, commonsense knowledge of the world: someone’s dad’s dad is their grandfather. Achieving such logical reasoning automatically has long been a goal of AI [17, 52, 73, 81].

The example above demonstrates two necessary abilities required for successful reasoning: first, holding large amounts of commonsense or general knowledge about the world, and second, processing and combining new information with existing knowledge. Transformer-based large language models have shown a remarkable capacity for the first of these abilities, repeatedly being demonstrated to memorize large amounts of data, or parametric knowledge, in their weights [8, 11, 49, 63].

For the second, recent work showed that transformers fine-tuned to predict answers over a concatenated context (“The cow is big; If something is big then it chases the dog; If the cow chases the dog then the cow sees the rabbit”) and question (“Did the cow see the rabbit?”) achieve high performance on reasoning tasks where all necessary knowledge is given in the context [17]. We refer to this general setting as in-context reasoning (ICR).

In real-world question-answering settings [16, 22, 39, 41], large amounts of contextual knowledge may be provided at once, and the information may not be perfectly filtered for a specific question. Unfortunately, in-context reasoning is highly sensitive to distractors [69]: additional facts that are not relevant to a question (e.g., “The cow is round” for the above example). Indeed, when fine-tuning and evaluating GPT-2 [59] for ICR, we find that adding distractors to the context drops performance from 99.4% to only 70.9% accuracy for the same questions (§4.2). This sensitivity to distractors in contextual knowledge contrasts with GPT-2’s apparent robustness to distractors in parametric knowledge: for any specific example, most of the training data seen by GPT-2—which forms its parameters—is likely to be completely irrelevant to that example. Naturally, we wonder whether presenting contextual knowledge in the same way as memorized knowledge, by encoding it into a model’s parameters, will improve the reasoning abilities of transformer-based language models.

In this work, we propose a novel bi-level optimization algorithm, RECKONING, that learns to memorize (and reason) over facts (i.e., knowledge) by performing inference-time parameter updates using gradients computed from a language modeling loss on those facts. The updated model is then used to answer any questions about those facts. Our training framework involves two nested loops: the inner loop performs fast adaptations from a set of initial weights to memorize a set of external knowledge through a few gradient updates, and the outer loop optimizes those same initial weights such that the updated model will solve reasoning problems associated with the memorized knowledge. In other words, the outer loop learns optimal meta-parameters that can rapidly memorize and successfully reason over contextual knowledge, allowing knowledge memorization to be optimized directly for downstream reasoning. At inference time, instead of including external knowledge in the input sequence as the prefix to a question prompt, the model can encode it in its parameters through gradient updates and then reason over its updated parametric knowledge to reach a conclusion.

We evaluate RECKONING on two synthetic multi-hop reasoning datasets: ProofWriter [73] and CLUTRR-Systematic-Generalization (CLUTRR-SG) [28], and one real-world dataset, FOLIO [29], comparing against a fine-tuned ICR (FT-ICR) baseline that uses the same underlying model. Our results show that RECKONING consistently outperforms the FT-ICR baseline on each benchmark, demonstrating that it successfully learns to answer multi-hop reasoning questions as desired. In particular, we find that RECKONING more successfully generalizes to adversarial settings, such as the presence of distractor facts and the introduction of longer reasoning chains at inference time. Finally, while the inference-time gradient updates make RECKONING slower to process new knowledge than a typical ICR forward pass, our run-time analysis shows that RECKONING is more efficient when answering multiple questions about a shared knowledge set. This speedup occurs because RECKONING only needs to encode the knowledge once to answer multiple questions about it. Overall, we demonstrate that RECKONING is an effective algorithm for reasoning through dynamic and controllable knowledge encoding, overcoming an observed weakness in the common reasoning setting and providing multiple additional benefits.

This paper is available on arxiv under CC BY 4.0 DEED license.

Authors:

(1) Zeming Chen, EPFL ([email protected]);

(2) Gail Weiss, EPFL ([email protected]);

(3) Eric Mitchell, Stanford University ([email protected])';

(4) Asli Celikyilmaz, Meta AI Research ([email protected]);

(5) Antoine Bosselut, EPFL ([email protected]).