Table of Links
A. The Connection Between Prefix-tuning and Hypernetwork
B. Number of Tunable Parameters
3. Preliminaries
3.1. Pretrained Language Models
All of our models are built on top of the state-of-the-art language model, T5 (Raffel et al., 2020), consisting of an encoder-decoder Transformer (Vaswani et al., 2017) with minor modifications. It frames language tasks as sequenceto-sequence generation, and is trained simultaneously on multiple task datasets. This large-scale T5 model achieves state-of-the-art performances across a diverse set of tasks. We use the T5 backbone as it enables training a universal model that interfaces with many downstream language tasks.
3.2. Multi-task Learning Problem formulation
3.3. Hypernetworks
Authors:
(1) Zhengkun Zhang, with Equal contribution from Work is done at the internship of Noah’s Ark Lab, Huawei Technologies
(2) Wenya Guo and TKLNDST, CS, Nankai University, China ([email protected]);
(3) Xiaojun Meng, with Equal contribution from Noah’s Ark Lab, Huawei Technologies;
(4) Yasheng Wang, Noah’s Ark Lab, Huawei Technologies;
(5) Yadao Wang, Noah’s Ark Lab, Huawei Technologies;
(6) Xin Jiang, Noah’s Ark Lab, Huawei Technologies;
(7) Qun Liu, Noah’s Ark Lab, Huawei Technologies;
(8) Zhenglu Yang, TKLNDST, CS, Nankai University, China.
This paper is