This story draft by @escholar has not been reviewed by an editor, YET.

NIPELT: A Unified Framework for Parameter-Efficient Language Model Tuning: Unifying PELT Methods

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Preliminaries

  2. Unifying PELT Methods

  3. Experiments

    4.1 Experiment Setup

    4.2 Analysis of Individual PELT Methods

    4.3 Analysis of UNIPELT

    4.4 Efficiency of PELT Methods

  4. Related Work

  5. Conclusion, Acknowledgements, and References

3 Unifying PELT Methods

3.1 Task Formulation

3.2 Proposed Method

Motivation & Intuition. During the analysis of individual PELT methods, we observe that different PELT methods exhibit diverse characteristics and perform rather differently on the same task. For example, prefix-tuning generally performs well on natural language inference tasks regardless of the size of training data. Also, as can be seen in Fig. 1 and Sec. 2, different PELT methods often involve different parts of the PLM architecture (e.g., before multi-head attention for prefix-tuning and after feedforward layer for adapter), making it feasible to combine multiple PELT methods without (directly) interfering with each other.


In light of the two observations above, we propose a unified PELT framework, UNIPELT, which takes a hybrid approach by incorporating multiple PELT methods as submodules. At a high level, UNIPELT improves over single PELT methods due to two factors. First, UNIPELT learns to activate (upweight) the submodules that best suit the current task or specific data sample and deactivate (downweight) the rest. Second, we find that UNIPELT generally performs better than taking the best performance of all its submodules used individually on each task, suggesting that there could be some compounding effects that lead to better model effectiveness when multiple PELT methods (that modify different parts of the PLM) are used.


Next, we will introduce how different PELT methods can be incorporated into UNIPELT via gating mechanism.




Despite the seeming simplicity of UNIPELT, we note that it is nontrivial for a unified approach to work well under different scenarios. Naively combining different PELT methods as a hybrid approach could lead to mixed or worse performance than using individual methods, as observed in both our experiments and prior studies (Hu et al., 2021).


Authors:

(1) Yuning Mao, University of Illinois Urbana-Champaign and the work was done during internship at Meta AI ([email protected]);

(2) Lambert Mathias, Meta AI ([email protected]);

(3) Rui Hou, Meta AI ([email protected]);

(4) Amjad Almahairi, Meta AI ([email protected]);

(5) Hao Ma, Meta AI ([email protected]);

(6) Jiawei Han, University of Illinois Urbana-Champaign ([email protected]);

(7) Wen-tau Yih, Meta AI ([email protected]);

(8) Madian Khabsa, Meta AI ([email protected]).


This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL license.

[3] Prefix-tuning cannot be fully eliminated as adapter or LoRA due to the softmax operation in multi-head attention.

L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks