Authors:
(1) Frank Palma Gomez, from Boston University and the work done by him and Ramon during their internship in Google Research and Google DeepMind respectively;
(2) Ramon Sanabria, The University of Edinburgh and the work done by him and Frank during their internship in Google Research and Google DeepMind respectively;
(3) Yun-hsuan Sung, Google Research;
(4) Daniel Cer, Google Research;
(5) Siddharth Dalmia, Google DeepMind and Equal Advising Contributions;
(6) Gustavo Hernandez Abrego, Google Research and Equal Advising Contributions.
Table of Links
8 Acknowledgements and References
4 Model
Figure 1 shows an illustration of our model. We initialize our dual encoder from PaLM 2 XXS (Google et al., 2023) and append a linear projection layer after pooling the outputs along the sequence length dimension. The embedding and linear projection layers are initialized randomly. After initializing our model from PaLM 2, we use a contrastive loss (Hadsell et al., 2006). Appendix A.1 includes more details on our training setup. We will refer to our proposed model as PaLM 2 DE.
This paper is available on arxiv under CC BY 4.0 DEED license.