Authors:
(1) Siqi Kou, Shanghai Jiao Tong University and with Equal contribution;
(2) Lanxiang Hu, University of California, San Diego and with Equal contribution;
(3) Zhezhi He, Shanghai Jiao Tong University;
(4) Zhijie Deng, Shanghai Jiao Tong University;
(5) Hao Zhang, University of California, San Diego.
Table of Links
3. Methodology and 3.1. Preliminary: Jacobi Decoding
3.2. Consistency Large Language Models (CLLMs)
3.3. Acceleration Mechanisms in CLLMs
4. Experiments
4.2. Acceleration Mechanisms in CLLMs
4.4. Limitations and Discussion
5. Conclusion, Impact Statement, and References
A. Illustration of Consistency Loss Learning Objectives
B. Comparison with Baseline Algorithms
C. Pesudo Code for Jacobi Decoding with KV Cache
A. Illustration of Consistency Loss Learning Objectives
In our proposed method described in Section 3.2, we use Jacobi trajectories collected from a target model to train the model with a loss that encourages single-step convergence during Jacobi iterations. This is achieved with either choice of the two consistency loss:
• Global consistency loss: directly minimize the distance D between any arbitrary point y on a Jacobi trajectory and the fixed point y ∗ in Equation 4.
An illustration further depict the global consistency loss and the local consistency loss in Figure 4 and Figure 5.
This paper is available on arxiv under CC0 1.0 Universal license.