This story draft by @escholar has not been reviewed by an editor, YET.

CLLMs: Consistency Large Language Models: Acceleration Mechanisms in CLLMs

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

Authors:

(1) Siqi Kou, Shanghai Jiao Tong University and with Equal contribution;

(2) Lanxiang Hu, University of California, San Diego and with Equal contribution;

(3) Zhezhi He, Shanghai Jiao Tong University;

(4) Zhijie Deng, Shanghai Jiao Tong University;

(5) Hao Zhang, University of California, San Diego.

Table of Links

Abstract and 1 Introduction

2. Related Work

3. Methodology and 3.1. Preliminary: Jacobi Decoding

3.2. Consistency Large Language Models (CLLMs)

3.3. Acceleration Mechanisms in CLLMs

4. Experiments

4.1. Evaluations

4.2. Acceleration Mechanisms in CLLMs

4.3. Ablation Studies

4.4. Limitations and Discussion

5. Conclusion, Impact Statement, and References

A. Illustration of Consistency Loss Learning Objectives

B. Comparison with Baseline Algorithms

C. Pesudo Code for Jacobi Decoding with KV Cache

3.3. Acceleration Mechanisms in CLLMs

Next, we compare the Jacobi trajectory of the target LLM and CLLM in Figure 2 to chase an in-depth understanding of acceleration mechanisms in CLLMs.


As shown in the left side of Figure 2, target LLMs typically generate only one correct token in one iteration. In contrast, we identify fast forwarding phenomenon where multiple consecutive tokens are correctly predicted in a single forward pass in CLLMs. The average fast forward count per forward pass in CLLMs ranges from 2 to 6 tokens as evaluated in Table 3. Moreover, tokens correctly generated in advance (e.g. “country” and “H” in point 5 and 6 in the left side of Figure 2), are often replaced inaccurately in subsequent iterations in target LLMs. Unlike the pre-trained models, CLLMs exhibit the capability of predicting correct tokens preemptively, even with preceding incorrect tokens, while ensuring the tokens remain unchanged. We term such tokens as stationary tokens, whose existance allow simultaneous extension of discontinuous correct tokens within the n-token sequence. Both phenomena contribute to the fast convergence in Jacobi decoding of CLLMs, thereby leading to a considerable generation speedup.


We observe that CLLMs acquire a crucial linguistic concept through training – collocations: a series of words or terms that co-occur more frequently than one would expect by random chance (Smadja, 1991). Language is not solely composed of isolated words but also relies heavily on specific word pairings. Examples of collocations are abundant in both natural and coding languages. They include verb + preposition combinations (e.g., “talk to”, “remind ... of ...”), verb + noun structures (e.g., “make a decision”, “catch a cold”), and many more domain-specific syntactical structures (e.g., “SELECT ... FROM ...”, “if ... else” for programming). The consistency generation objective allows CLLMs to infer such structures from any point in the Jacobi trajectory, encouraging CLLMs to acquire proficiency in numerous collocations and thereby predict multiple words simultaneously to minimize iteration steps.


Notably, lookahead decoding (Fu et al., 2024) collects ngrams generated from previous Jacobi iterations as candidate tokens and verifies them in the next iteration to accelerate decoding. CLLMs can also be combined with lookahead decoding and achieve extra speedup (see Table 1 and Table 2) because collocations learned in CLLMs improve the quality of n-grams and thus increase the acceptance rate.


This paper is available on arxiv under CC0 1.0 Universal license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks