Gushakisha Vector bitanga neza ibisobanuro bisa nkibisekuru byongerewe imbaraga, ariko ntibikora nabi hamwe nijambo ryibanze ryishakisha cyangwa amagambo yo gushakisha hanze. Kuzuza vector gushakisha hamwe nijambo ryibanze gushakisha nka BM25 no guhuza ibisubizo na reranker birahinduka inzira isanzwe yo kubona ibyiza byisi byombi.
Rerankers nicyitegererezo cya ML gifata ibisubizo byubushakashatsi no kubisubiramo kugirango bitezimbere. Basuzuma ikibazo cyahujwe na buri mukandida ibisubizo birambuye, bikaba bihenze kubara ariko bitanga ibisubizo nyabyo kuruta uburyo bworoshye bwo gushaka bwonyine. Ibi birashobora gukorwa haba nkicyiciro cya kabiri hejuru yubushakashatsi bumwe (gukuramo ibisubizo 100 mubushakashatsi bwa vector, hanyuma usabe reranker kumenya 10 ya mbere) cyangwa, kenshi, guhuza ibisubizo bivuye muburyo butandukanye bwo gushakisha; muriki kibazo, gushakisha vector no gushakisha ijambo ryibanze.
Ariko ni bangahe reankers itari nziza? Kugirango mbimenye, nagerageje rerankers esheshatu kumyandiko kuva kuri
Twagerageje ibi bisobanuro:
Rerankers yagaburiwe ibisubizo 20 byambere muri DPR na BM25, hanyuma hasuzumwa NDCG @ 5.
Mubisubizo, gushakisha ibishashara mbisi (hamwe no gushiramo kuva moderi ya bge-m3) byanditseho dpr (kugarura ibice byuzuye). BGE-m3 yatoranijwe kubara ibyinjijwe kuko aribyo abanditsi ba ColPali bakoresheje nkibanze.
Dore amakuru yerekeye akamaro (NDCG @ 5):
Kandi dore uburyo barihuta mugushakisha gushakisha muri arxiv dataset; ubukererwe buringaniye nuburebure bwinyandiko. Nugufata ubukererwe, hasi rero nibyiza. Kwiyakira-bge moderi yakoreshwaga kuri NVIDIA 3090 ikoresheje code yoroshye ishoboka yazamuye neza
Hanyuma, dore uko byasabye buri moderi gusubiramo hafi 3000 gushakisha muri datasets zose uko ari esheshatu. Cohere ibiciro kuri buri shakisha (hamwe namafaranga yinyongera kubinyandiko ndende), mugihe ibindi biciro kuri token.
RRF yongeraho bike kubidafite agaciro kubushakashatsi bwa Hybrid; kuri kimwe cya kabiri cyamakuru, yakoze nabi kurusha BM25 cyangwa DPR yonyine. Ibinyuranye, ML-ishingiye kuri rerankers zose zageragejwe zatanze iterambere ryiza kuri vector nziza cyangwa gushakisha ijambo ryibanze, hamwe na Voyage rerank-2 ishyiraho umurongo wingenzi.
Tradeoffs iracyahari: ubunyangamugayo burenze kuri Voyage rerank-2, gutunganya byihuse kuva Cohere, cyangwa imikorere ikomeye yo hagati kuva Jina cyangwa Voyage yerekana. Ndetse ifungura-isoko ya BGE reranker, mugihe ikurikira inzira zubucuruzi, yongerera agaciro gakomeye amakipe ahitamo kwiyakira.
Nka fondasiyo yibanze ikomeza gutera imbere, turashobora kwitega imikorere myiza. Ariko uyumunsi ML rerankers imaze gukura bihagije kugirango ikoreshwe ikizere mubirimo indimi nyinshi.
Bya Jonathan Ellis, DataStax