Table of Links
-
Experiments
4.4 Efficiency of PELT Methods
We benchmark the efficiency of PELT methods and list in Table 4 their number of trainable parameters and training/inference time relative to fine-tuning.
Parameter Efficiency. As the trainable parameters in PELT methods are almost negligible, combining multiple methods does not lead to significant losses in parameter efficiency. UNIPELT still has few trainable parameters compared to fine-tuning (0.99%~1.26%). The parameters can be further reduced if one uses more parameter-efficient variants (e.g., Karimi Mahabadi et al. (2021a)), which can be easily swapped with the vanilla version used in our current framework. Also, note that more trainable parameters do not always lead to better performance, as shown in our experiments and prior studies (He et al., 2021; Pfeiffer et al., 2021).
Training and Inference Efficiency. Due to parameter efficiency, all PELT methods train 30%~50% faster than fine-tuning and incorporating multiple PELT methods into UNIPELT does not suffer from slower training. On the other hand, the inference time of PELT methods is generally longer since they involve more FLOPs. UNIPELT has a slightly larger inference overhead (4%~11% compared to its slowest submodule), which we argue is insignificant since larger models that may achieve similar performance gains (e.g., BERTlarge) need around 300% inference time (Wolf et al., 2020).
Authors:
(1) Yuning Mao, University of Illinois Urbana-Champaign and the work was done during internship at Meta AI ([email protected]);
(2) Lambert Mathias, Meta AI ([email protected]);
(3) Rui Hou, Meta AI ([email protected]);
(4) Amjad Almahairi, Meta AI ([email protected]);
(5) Hao Ma, Meta AI ([email protected]);
(6) Jiawei Han, University of Illinois Urbana-Champaign ([email protected]);
(7) Wen-tau Yih, Meta AI ([email protected]);
(8) Madian Khabsa, Meta AI ([email protected]).
This paper is