paint-brush
Ukulinganisa Amamodeli Amakhulu Olimi Lomsindo Nge-Generative Comprehensionnge@benchmarking
576 ukufundwa
576 ukufundwa

Ukulinganisa Amamodeli Amakhulu Olimi Lomsindo Nge-Generative Comprehension

Kude kakhulu; Uzofunda

I-AIR-Bench iwuphawu lokuma olusha oluhlola ama-LALM ekuqondeni kwesignali yomsindo nokusebenzisana kusetshenziswa izilinganiso eziyisisekelo nezingxoxo, enikeza imininingwane yokuthuthukisa esikhathini esizayo.
featured image - Ukulinganisa Amamodeli Amakhulu Olimi Lomsindo Nge-Generative Comprehension
Benchmarking in Business Technology and Software HackerNoon profile picture
0-item

Ababhali:

(1) Qian Yang, Zhejiang University, Umnikelo olinganayo. Lo msebenzi wenziwa ngesikhathi sokuqeqeshwa kukaQian Yang e-Alibaba Group;

(2) Jin Xu, Iqembu le-Alibaba, umnikelo olinganayo;

(3) uWenrui Liu, iNyuvesi yaseZhejiang;

(4) Yunfei Chu, Iqembu le-Alibaba;

(5) Xiaohuan Zhou, Iqembu le-Alibaba;

(6) Yichong Leng, Iqembu le-Alibaba;

(7) I-Yuanjun Lv, Iqembu le-Alibaba;

(8) Zhou Zhao, Iqembu le-Alibaba kanye Elihambisana noZhou Zhao ([email protected]);

(9) Yichong Leng, Inyuvesi yaseZhejiang

(10) u-Chang Zhou, i-Alibaba Group kanye no-Chang Zhou ([email protected]);

(11) Jingren Zhou, Iqembu le-Alibaba.

Ithebula Lezixhumanisi

Abstract kanye 1. Isingeniso

2 Umsebenzi Ohlobene

3 AIR-Bench kanye 3.1 Uhlolojikelele

3.2 Ibhentshimakhi eyisisekelo

3.3 Ibhentshimakhi yengxoxo

3.4 Isu Lokuhlola

4 Izivivinyo

4.1 Amamodeli

4.2 Imiphumela Ebalulekile

4.3 Ukuhlolwa Komuntu kanye 4.4 Nocwaningo Lokwesulwa Kokuchema Kwesimo

5 Isiphetho Nezikhombo

Imiphumela enemininingwane ye-Foundation Benchmark

Abstract

Muva nje, amamodeli olimi olulalelwayo alandela imiyalelo athole ukunakwa okubanzi ngokusebenzisana komsindo womuntu. Kodwa-ke, ukungabikho kwamabhentshimakhi akwazi ukuhlola amandla okuxhumana amaphakathi kuvimbe intuthuko kulo mkhakha. Amamodeli wangaphambilini agxile ngokuyinhloko ekuhloleni imisebenzi eyisisekelo ehlukene, efana ne-Automatic Speech Recognition (ASR), futhi ayinakho ukuhlola kwamakhono okukhiqiza avulekile agxile kumsindo. Ngakho, kuyinselele ukulandelela ukuqhubeka esizindeni samamodeli amakhulu olimi lomsindo (ama-LALM) kanye nokuhlinzeka ngomhlahlandlela wokuthuthukiswa kwesikhathi esizayo. Kuleli phepha, sethula i-AIR-Bench (Audio InstRuction Benchmark), ibhentshimakhi yokuqala eklanyelwe ukuhlola ikhono lama-LALM okuqonda izinhlobo ezahlukene zezimpawu zomsindo (okuhlanganisa inkulumo yomuntu, imisindo yemvelo nomculo), futhi ngaphezu kwalokho, ukuxhumana nabantu. ngefomethi yombhalo. I-AIR-Bench ihlanganisa izilinganiso ezimbili: isisekelo kanye nezilinganiso zokuxoxa. Owokuqala uqukethe imisebenzi eyi-19 enemibuzo ecishe ibe ngu-19k yokukhetha eyodwa, okuhloswe ngayo ukuhlola ikhono eliyisisekelo lomsebenzi owodwa lama-LALM. Eyokugcina iqukethe izikhathi ezingu-2k zedatha yemibuzo nezimpendulo evulekile, ehlola ngokuqondile ukuqonda kwemodeli kumsindo oyinkimbinkimbi namandla ayo okulandela imiyalelo. Womabili amabhentshimakhi adinga imodeli ukuthi ikhiqize ama-hypotheses ngokuqondile. Sakha uhlaka oluhlanganisiwe olusebenzisa amamodeli olimi athuthukisiwe, njenge-GPT-4, ukuze sihlole amaphuzu aqanjiwe anikezwa ulwazi lwe-meta lomsindo. Imiphumela yokuhlola ibonisa izinga eliphezulu lokuvumelana phakathi kokuhlola okusekelwe ku-GPT-4 nokuhlola komuntu. Ngokudalula imikhawulo yama-LALM akhona ngemiphumela yokuhlola, i-AIR-Bench inganikeza imininingwane mayelana nesiqondiso socwaningo lwesikhathi esizayo.

1 Isingeniso

Intuthuko yakamuva kubuhlakani obujwayelekile bokwenziwa igqugquzelwe kakhulu ukuvela kwezinhlobo zezilimi ezinkulu (LLMs) (Brown et al., 2020; OpenAI, 2022, 2023; Chowdhery et al., 2022; Anil et al., 2023; Touvron et. al., 2023a,b; Bai et al., 2023a). Lawa mamodeli abonisa amakhono amangalisayo ekugcineni ulwazi, ukuzibandakanya ekucabangeni okuyinkimbinkimbi, nokuxazulula izinkinga ngokulandela izinhloso zomuntu. Ngokukhuthazwa inqubekelaphambili emangalisayo kumamodeli wezilimi ezinkulu (LLMs), isizinda samamodeli amakhulu olimi olulalelwayo (ama-LALM) senziwe izinguquko ezinkulu. Ukubona futhi uqonde amasiginali omsindo anothile futhi ukhiqize kabusha izimpendulo ezibhalwe phansi ngokulandela imiyalelo yomuntu, kuye kwahlongozwa imisebenzi eminingi, efana neSALMONN (Tang et al., 2023a), BLSP (Wang et al., 2023a), Speech-LLaMA (Wu et al., 2023a), BLSP (Wang et al., 2023a), Speech-LLaMA (Wu et al. al., 2023a), kanye ne-Qwen-Audio (Chu et al., 2023), ebonisa amakhono athembisayo ezingxoxo ezimaphakathi zomsindo.


Kodwa-ke, ama-LALM adlule (Tang et al., 2023a; Wang et al., 2023a; Wu et al., 2023a; Chu et al., 2023; Huang et al., 2023b; Shen et al., 2023; Gong et al. ., 2023; Wang et al., 2023b) bagxile kakhulu ekuhloleni emisebenzini ethile ebalulekile. Ukungabikho kwebhentshimakhi emisiwe yokuhlola imiyalelo ekhiqizayo ukulandela amakhono alawa mamodeli kubangele ukuthembela ekuboniseni izibonelo noma ekukhishweni kwamamodeli engxoxo ukuze kuhlolwe umphakathi ukuze kuboniswe amakhono awo okukhulumisana. Le ndlela idala izinselelo ezibalulekile ekuqhathaniseni ngendlela efanele nenhloso kuyo yonke imizamo yocwaningo eyahlukene. Ngaphezu kwalokho, kuvame ukufihla imikhawulo ekhona yamamodeli, kuvimbe amandla okuqapha intuthuko ngaphakathi kwesizinda sama-LALM.


Ukuze kuhlolwe ezizindeni zomsindo, iningi lemizamo yocwaningo igxile ekwakhiweni kwamabhentshimakhi enzelwe imisebenzi ngayinye efana ne-LibriSpeech (Panayotov et al., 2015) kanye nebhentshimakhi Yezwi Elivamile (Ardila et al., 2019) ye-ASR. Ngale kwaleyo eqondene nomsebenzi othile, amabhentshimakhi afana ne-SUPERB (Yang et al., 2021a) kanye ne-HEAR (Turian et al., 2021) aklanyelwe ukuhlola ukuguquguquka kwamamodeli okufunda azigadile emisebenzini ehlukahlukene. Mayelana nokuhlolwa kwekhono lama-LALM lokulandela imiyalelo, ngokwazi kwethu, i-Dynamic-SUPERB (Huang et al., 2023a) ukuphela kwebhentshimark enikelwe kulesi sici. Noma kunjalo, i-Dynamic-SUPERB igxile kuphela ekucubunguleni inkulumo yomuntu, futhi ayidluleli ekuhlolweni kwamakhono amamodeli ekukhiqizeni izizukulwane ezivulekile njengezingxoxo.


Kuleli phepha, sethula i-AIR-Bench (Audio InstRuction Benchmark), ibhentshimakhi yenoveli eyenzelwe ukuhlola ikhono lama-LALM okuqonda amasiginali ahlukahlukene omsindo nokusebenzisana ngokulandela imiyalelo. I-AIR-Bench ibonakala ngezici ezintathu eziyinhloko: 1) Ukufakwa kwamasignali omsindo okuphelele. I-AIR-Bench inikeza ukuhlanganisa okuphelele kwamasiginali omsindo, okuhlanganisa inkulumo yomuntu, imisindo yemvelo, nomculo, iqinisekisa ukuhlolwa okuphelele kwamakhono ama-LALM. 2) Isakhiwo sokuma kwe-Hierarchical Benchmark. Ibhentshimakhi iqukethe isisekelo kanye nebhentshimakhi yengxoxo. Ibhentshimakhi eyisisekelo ihlanganisa imisebenzi yomsindo ehlukile engu-19 enemibuzo yokuzikhethela eyodwa engaphezu kuka-19,000, umbuzo ngamunye ugxile kuphela ekhonweni eliyisisekelo elithile. I-GPT-4 (OpenAI, 2023) inweba imibuzo nokukhetha kwekhandidethi isebenzisa ukwaziswa okuklanyiwe okuzinikele. Ingxenye yengxoxo iqukethe imibuzo engaphezu kuka-2,000 ebuzwa ngomsindo evulekile. Ukuze kuthuthukiswe inkimbinkimbi yomsindo futhi sifinyelele ukufana okuseduze nomsindo oyinkimbinkimbi okuhlangatshezwana nawo ezimeni zangempela, siphakamisa isu elisha lokuxuba umsindo elihlanganisa ukulawula umsindo nokugudluka kwesikhashana. Ngokukhethekile, silungisa umsindo futhi sethule ama-offset esikhashana ahlukene ngesikhathi senqubo yokuhlanganisa iziqeshana zomsindo ezimbili. Ukwehluka okuwumphumela womsindo ohlobene nendawo yesikhashana bese kuqoshwa njengolwazi olwengeziwe lwe-meta, okufaka isandla ekumeleleni okubanzi kombhalo komsindo. Izinga ledatha lisekelwa ngokuhlunga okuzenzakalelayo nge-GPT-4, okulandelwa ukuqinisekiswa mathupha. 3) Uhlaka lokuhlola olubumbene, olunenhloso nolwenziwa kabusha. Amamodeli ayadingeka ukuze enze ukulandelana kwe-hypothesis ngokuqondile kuwo womabili amabhentshimakhi ukuze aqondaniswe ngokunembe kakhulu nezimo ezingokoqobo. Bese, siqasha i-GPT-4 ukuze sikhiqize izimpendulo eziyisethenjwa ezinikezwe ulwazi lwe-meta ngokusebenzisa imiyalo eyakhiwe ngokucophelela. Izinkomba ezinikezwe kanye nokuqagela, kulandela uLiu et al. (2023b); Bayi et al. (2023b), sisebenzisa i-GPT-4 (OpenAI, 2023) ukuze sahlulele ukuthi ukukhetha kulungile yini kubhentshimakhi yesisekelo noma amaphuzu okuqagela ebhentshimakhi yengxoxo. Siphinde senze igoli lesibili ngokushintshanisa izindawo zabo ukuze kuqedwe ukuchema kwezikhundla. Ngokusekelwe ekuhlolweni okuphelele kwama-LALM angu-9, sibona ukuthi ama-LALM akhona angaba nokuqonda okulinganiselwe komsindo noma amakhono okulandelela imiyalelo, okushiya isikhala esibalulekile sokuthuthukiswa kulo mkhakha.


Umnikelo wethu ufinyezwe ngezansi:


• I-AIR-Bench iwuphawu lokuqala lokuhlola olukhiqizayo lwamamodeli amakhulu olimi olulalelwayo, oluhlanganisa uhlu olubanzi lomsindo njengenkulumo, imisindo yemvelo nomculo. I-AIR-Bench iyibhentshimakhi enkulu futhi elandelanayo, ehlanganisa ibhentshimakhi eyisisekelo enemisebenzi yomsindo engu-19 nemibuzo yokuzikhethela engaphezu kuka-19k, eduze nebhentshimakhi yengxoxo enemibuzo yomsindo ekhethiwe evulekile engaphezu kuka-2k ekhethwe ngokucophelela ukuze ihlolwe kabanzi.


• Siphakamisa isu elisha lokuxuba umsindo elinokulawula umsindo nokugudluka kwesikhashana ukuze kuthuthukiswe ubunkimbinkimbi bomsindo.


• Kusungulwe uhlaka lokuhlola oluhlangene, oluyinhloso, futhi olungaphinde lukhiqizeke ukuze kuhlolwe ikhwalithi yemibono eqhamukayo.


• Senze ukuhlola okuphelele kwamamodeli angu-9 ngenjongo yokulinganisa. Ikhodi yokuhlola, amasethi edatha, kanye nebhodi labaphambili elivuliwe kuzokwenziwa kutholakale esidlangalaleni maduze.


Leli phepha litholakala ku-arxiv ngaphansi kwelayisensi ye-CC BY 4.0 DEED.