This story draft by @escholar has not been reviewed by an editor, YET.

NIPELT: A Unified Framework for Parameter-Efficient Language Model Tuning: Analysis of UNIPELT

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Preliminaries

  2. Unifying PELT Methods

  3. Experiments

    4.1 Experiment Setup

    4.2 Analysis of Individual PELT Methods

    4.3 Analysis of UNIPELT

    4.4 Efficiency of PELT Methods

  4. Related Work

  5. Conclusion, Acknowledgements, and References

4.3 Analysis of UNIPELT

Next, we will turn to our proposed framework UNIPELT, which incorporates multiple existing PELT methods as submodules.


Low-Resource Performance. Overall, UNIPELT (APL) and UNIPELT (AP) consistently achieve the best and second best average performance on both the development and test sets regardless of the number of training samples. The gains are generally 1~4% over the submodule that performs the best (when used individually). Such results demonstrate the advantages of our hybrid approach regarding model effectiveness and generalizability.


At the per-task level, UNIPELT (APL) and UNIPELT (AP) perform the best or second best on 7/6/7 of 8 tasks when trained with 100/500/1,000 samples, and never perform the worst in any setup. When comparing the two variants, UNIPELT (APL) outperforms UNIPELT (AP) on 4/6/8 of 8 tasks when trained with 100/500/1,000 samples. Such results indicate that UNIPELT is quite robust and performs reliably under different scenarios. The improvements of UNIPELT over its submodules are generally larger when having fewer training samples, suggesting that UNIPELT performs especially well in the low-resource regime. In particular, on the tasks where other PELT methods fail to learn effectively such as CoLA and QQP (K = 100), UNIPELT manages to achieve performance better than fine-tuning.


UNIPELT vs. Upper Bound. In Table 2, we show the comparison of UNIPELT and the upper bound that takes the best performance of its submodules on each task. We observe that both UNIPELT (AP) and UNIPELT (APL) perform similarly or even better than their upper bound, which suggests that UNIPELT successfully learns to leverage different submodules and maintains (near) optimal performance under different setups. The fact that UNIPELT can outperform the upper bound also hints that a mixture of PELT methods (involving different parts of the PLM) might be inherently more effective than single methods (with a limited scope of the PLM architecture).


Table 2: Comparison of average test performance between UNIPELT and the upper bound that takes the best performance of its submodules on each task.


High-Resource Performance. In Table 3, we list the performance of different methods when all training samples are used. UNIPELT again achieves the best overall performance. The gains are not as significant as in the low-resource setting, which is somewhat expected as existing PELT methods typically perform on par with fine-tuning given abundant training data and the potential of improvement is not as high. That said, the performance of UNIPELT is still the best or 2nd best on all 8 tasks, and generally comparable to the best submodule used individually on each task. Besides, simply combining multiple PELT methods without gating does not work well in the high-resource setting – although UNIPELT-NoGate never performs the worst in each task, its average performance is unsatisfactory (-0.89 vs. UNIPELT).


Table 3: Results on the GLUE benchmark when all training samples are used.


Table 4: Number of trainable parameters and Training/Inference time relative to fine-tuning.


Authors:

(1) Yuning Mao, University of Illinois Urbana-Champaign and the work was done during internship at Meta AI ([email protected]);

(2) Lambert Mathias, Meta AI ([email protected]);

(3) Rui Hou, Meta AI ([email protected]);

(4) Amjad Almahairi, Meta AI ([email protected]);

(5) Hao Ma, Meta AI ([email protected]);

(6) Jiawei Han, University of Illinois Urbana-Champaign ([email protected]);

(7) Wen-tau Yih, Meta AI ([email protected]);

(8) Madian Khabsa, Meta AI ([email protected]).


This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks