The Efficiency of Dynamic Knowledge Encoding: RECKONING's Advantage in Multi-Question Scenarios

Table of Links

4.4 Run-time Analysis

One of the advantages of RECKONING is the ability to memorize a large set of knowledge K and answer multiple related questions about that knowledge at a little extra cost per question. Specifically, in contrast to ICR, RECKONING can encode K once and answer multiple questions without needing to reprocess it for each question asked. To test whether RECKONING could be a more efficient method for inference in this setting, we measure the wall-clock time (in seconds) of the complete inference pipeline of RECKONING vs. ICR. For this experiment, we use a synthetic reasoning dataset in which K is a sequence of random letters, and the question x asks for the most frequent letter in the context. The total number of tokens in each example is 1024: 7 for x, 1 for the answer, and the remaining 1016 for K, broken into 8 “facts”.

The FT-ICR baseline receives a sequence including all 8 facts and the question. In contrast, RECKONING receives the 8 facts as a batch of eight segments of 127 tokens and encodes them in parallel in the inner loop. In the outer loop, the model only receives the question or a batch of questions. We focus on two settings: (1) inference time for a single question and (2) inference time when answering multiple questions. In the multiple-question setting, we set the number of questions to 18 (the same as in ProofWriter). For RECKONING, the inference process includes the innerloop knowledge encoding and the final forward pass to encode the question. We set the number of inner loop gradient steps to 1 and 4. In Table 3, we see that when answering a single question, RECKONING does not perform inference faster than in-context reasoning. However, RECKONING shows significant advantages under a multi-question setting. Both the 1-step inner loop and the 4-step inner loop are faster than the baseline. Since RECKONING encodes the knowledge in model parameters, it does not need to reprocess the knowledge for a related question and is more efficient. We run this experiment on 1 RTX 3090 GPU.[3]

Authors:

(1) Zeming Chen, EPFL ([email protected]);

(2) Gail Weiss, EPFL ([email protected]);

(3) Eric Mitchell, Stanford University ([email protected])';

(4) Asli Celikyilmaz, Meta AI Research ([email protected]);

(5) Antoine Bosselut, EPFL ([email protected]).

This paper is available on arxiv under CC BY 4.0 DEED license.