This story draft by @escholar has not been reviewed by an editor, YET.

Experiments and Evaluation

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Proposed Method: Quantized DyLoRA
  2. Experiments and Evaluation
  3. On the semi-sorted behavior of QDyLoRA
  4. Conclusion, Limitations, and References


A. Supplementary Material

A.1. Hyperparameters

A.2. Generated Text Quality

3 Experiments and Evaluation

This section evaluates the efficiency and efficacy of QDyLoRA through several instruct-fine-tuning


Table 3: Comparing the performance of DyLoRA, QLoRA and QDyLoRA across different evaluation ranks. all models receives the same training settings. Maximum LoRA rank is set to 64. The results are reported in terms of exact matching.


tasks. The first experiment compares QDyLoRA with QLoRA on Massively Multitask Language Understating (MMLU) benchmark (Hendrycks et al., 2020), consisting of more than 50 different tasks, spanning from fundamental mathematics and U.S. history to computer science and law. As shown in Table 1 [1], we finetune LLaMA-7b, LLaMA-13b, LLaMA2-13b, and Falcon40b on different datasets, Alpaca (Taori et al., 2023), OASST1 (Köpf et al., 2023), Self-Instruct (Wang et al., 2022), and FLANv2 (Chung et al., 2022), using QLoRA and QDyLoRA techniques. We use the same training budget and maximum LoRA rank[2] for each technique. The results consistently show that QDyLoRA achieves a superior performance by finding the optimal rank.


The second experiment provides a more in-depth comparison between QLoRA and QDyLoRA. In particular, we fairly finetuned Falcon-40b on WebGLM (Liu et al., 2023) and GSM8k (Cobbe et al., 2021) benchmarks, and compared their test performances across different ranks. As described in Table 2, QDyLoRA attains superior performance, notably when employing its optimal ranks (Rank 2 for Web-GLM and Rank 8 for GSM8k). Furthermore, QDyLoRA exhibits consistent superiority over QLoRA, particularly at lower ranks. These findings emphasize the adaptive nature of QDyLoRA in dynamically adjusting its focus during fine-tuning, leading to enhanced efficiency and efficacy compared to its static counterpart, QLoRA. The third experiment compares the performance of DyLoRA, QDyLoRA, and QLoRA on GSM8k and TriviaQA (Joshi et al., 2017) while adopting LLaMA2-13b and LLaMA-7b as LLMs. Table 3 reports the results. As the table illustrates, for smaller-size models, i.e. LLaMA-7b, DyLoRA and QDyLoRA both perform superior than QLoRA. For larger models, i.e. LLaMA2-13b, DyLoRA fails due to the out-of-memory (OOM) error while QDyLoRA works the best in such situations.


Authors:

(1) Hossein Rajabzadeh, University of Waterloo and Huawei Noah’s Ark Lab ([email protected]);

(2) Mojtaba Valipour, University of Waterloo ([email protected]);

(3) Tianshu Zhu, Huawei Noah’s Ark Lab ([email protected]);

(4) Marzieh Tahaei, Huawei Noah’s Ark Lab ([email protected]);

(5) Hyock Ju Kwon, ([email protected]);

(6) Ali Ghodsi, ([email protected]);

(7) Boxing Chen, Huawei Noah’s Ark Lab ([email protected]);

(8) Mehdi Rezagholizadeh, Huawei Noah’s Ark Lab ([email protected]).


This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL license.

[1] The same settings as the original QLoRA work are applied here.


[2] The maximum LoRA rank is fixed to 64. While QLoRA’s rank is always fixed, QDyLoRA can split the training across ranks in range 1 to 64.

L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks