This story draft by @escholar has not been reviewed by an editor, YET.

Hyperparameters

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Proposed Method: Quantized DyLoRA
  2. Experiments and Evaluation
  3. On the semi-sorted behavior of QDyLoRA
  4. Conclusion, Limitations, and References


A. Supplementary Material

A.1. Hyperparameters

A.2. Generated Text Quality

A.1 Hyperparameters

Table 4 provides an overview of the hyperparameters and experimental configurations employed in this study, which are crucial configurations that determine various aspects of the training process and model behavior in this study. Common key parameters across the experiments include the choice of optimizer, Adam-Beta2 value, maximum gradient norm, and warmup ratio, which collectively influence how the model adjusts its weights during training. LoRA-specific parameters such as LoRA dropout probability, maximum LoRA rank, and alpha value control the behavior of LoRA layers. Additionally, double quantization and quantization type impact the precision of numerical representations within the model, which are considered the same as baselines. Learning rate scheduling and weight decay contribute to the optimization process, helping to prevent overfitting and stabilize training. Random seeds ensure reproducibility, while the specified GPU determines the hardware used for training. Each model configuration, whether for the Web-GLM, GSM8k, or the specific experiment outlined in Table 1, features parameters tailored to the characteristics of the dataset and the computational resources available. These hyperparameters collectively shape the training process, ultimately influencing the performance and effectiveness of the models in the study.


Authors:

(1) Hossein Rajabzadeh, University of Waterloo and Huawei Noah’s Ark Lab ([email protected]);

(2) Mojtaba Valipour, University of Waterloo ([email protected]);

(3) Tianshu Zhu, Huawei Noah’s Ark Lab ([email protected]);

(4) Marzieh Tahaei, Huawei Noah’s Ark Lab ([email protected]);

(5) Hyock Ju Kwon, ([email protected]);

(6) Ali Ghodsi, ([email protected]);

(7) Boxing Chen, Huawei Noah’s Ark Lab ([email protected]);

(8) Mehdi Rezagholizadeh, Huawei Noah’s Ark Lab ([email protected]).


This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks