This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Michael Günther, michael.guenther;
(2) Jackmin Ong, jackmin.ong;
(3) Isabelle Mohr, isabelle.mohr;
(4) Alaeddine Abdessalem, alaeddine.abdessalem;
(5) Tanguy Abel, tanguy.abel;
(6) Mohammad Kalim Akram, kalim.akram;
(7) Susana Guzman, susana.guzman;
(8) Georgios Mastrapas, georgios.mastrapas;
(9) Saba Sturua, saba.sturua;
(10) Bo Wang, bo.wang;
(11) Maximilian Werk, maximilian.werk;
(12) Nan Wang, nan.wang;
(13) Han Xiao, han.xiao}@jina.ai.
The training process for Jina Embeddings v2 is divided into three stages:
I Pre-training the Backbone: For the backbone pre-training, we design a modified BERT model capable of encoding documents with up to 8192 tokens. This model is trained on a full-text corpus using a masked language modeling objective.
II First Fine-tuning with Text Pairs: To encode a text passage into a single vector representation, the model is fine-tuned in an unsupervised manner on text pairs.
III Second Fine-tuning with Hard Negatives: The model is further fine-tuned using text pairs complemented with hard negatives. This
stage is crucial for enabling the model to better distinguish between relevant passages and related, but irrelevant text passages.
While both stages II and III are geared towards training the models for embedding tasks, the latter is especially critical for improving the model’s performance in retrieval and classification tasks (refer to Section 6.2).