Table of Links
- Proposed Method: Quantized DyLoRA
- Experiments and Evaluation
- On the semi-sorted behavior of QDyLoRA
- Conclusion, Limitations, and References
A. Supplementary Material
A.1 Hyperparameters
Table 4 provides an overview of the hyperparameters and experimental configurations employed in this study, which are crucial configurations that determine various aspects of the training process and model behavior in this study. Common key parameters across the experiments include the choice of optimizer, Adam-Beta2 value, maximum gradient norm, and warmup ratio, which collectively influence how the model adjusts its weights during training. LoRA-specific parameters such as LoRA dropout probability, maximum LoRA rank, and alpha value control the behavior of LoRA layers. Additionally, double quantization and quantization type impact the precision of numerical representations within the model, which are considered the same as baselines. Learning rate scheduling and weight decay contribute to the optimization process, helping to prevent overfitting and stabilize training. Random seeds ensure reproducibility, while the specified GPU determines the hardware used for training. Each model configuration, whether for the Web-GLM, GSM8k, or the specific experiment outlined in Table 1, features parameters tailored to the characteristics of the dataset and the computational resources available. These hyperparameters collectively shape the training process, ultimately influencing the performance and effectiveness of the models in the study.
Authors:
(1) Hossein Rajabzadeh, University of Waterloo and Huawei Noah’s Ark Lab ([email protected]);
(2) Mojtaba Valipour, University of Waterloo ([email protected]);
(3) Tianshu Zhu, Huawei Noah’s Ark Lab ([email protected]);
(4) Marzieh Tahaei, Huawei Noah’s Ark Lab ([email protected]);
(5) Hyock Ju Kwon, ([email protected]);
(6) Ali Ghodsi, ([email protected]);
(7) Boxing Chen, Huawei Noah’s Ark Lab ([email protected]);
(8) Mehdi Rezagholizadeh, Huawei Noah’s Ark Lab ([email protected]).
This paper is