Table of Links
A. The Connection Between Prefix-tuning and Hypernetwork
B. Number of Tunable Parameters
4. Proposed Method
4.1. Hyper-Embeddings for PELT
4.2. HyperPrefix: Incorporate with Prefix-tuning
Prefix-tuning (Li & Liang, 2021) prepends a number of taskspecific trainable prefix vectors to the parameters of multihead attention (i.e., keys and values) at each transformer layer. In the original implementation, the prefix vectors of each attention block are reparameterized by a two-layer feed-forward network:
4.3. HyperPELT: Incorporate with Adapter
Note that in Section 4.2, we use the prefix length N as the dimension for hyper-embeddings. We utilize an adaptive pooling operation on hyper-embeddings to adjust the dimension for adapter hypernetwork. Note that due to we extend the dimension of the components of hyper-embeddings in the last section, we utilize an adaptive pooling operation for hyper-embeddings to adjust the dimension for adapter hypernetwork.
4.4. VL-HyperPELT: Incorporate with Visual Modality
Authors:
(1) Zhengkun Zhang, with Equal contribution from Work is done at the internship of Noah’s Ark Lab, Huawei Technologies
(2) Wenya Guo and TKLNDST, CS, Nankai University, China ([email protected]);
(3) Xiaojun Meng, with Equal contribution from Noah’s Ark Lab, Huawei Technologies;
(4) Yasheng Wang, Noah’s Ark Lab, Huawei Technologies;
(5) Yadao Wang, Noah’s Ark Lab, Huawei Technologies;
(6) Xin Jiang, Noah’s Ark Lab, Huawei Technologies;
(7) Qun Liu, Noah’s Ark Lab, Huawei Technologies;
(8) Zhenglu Yang, TKLNDST, CS, Nankai University, China.
This paper is