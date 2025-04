Qorayaasha: (1) Ben Athiwaratkun, AWS AI Labs; (2) Sujan Kumar Gonugondla, AWS AI Labs; (3) Sanjay Krishna Gouda, AWS AI Labs; (4) Haifeng Qian, AWS AI Labs; (5) Sanjay Krishna Gouda, AWS AI Labs; (6) Hantian Ding, AWS AI Labs; (7) Qing Sun, AWS AI Labs; (8) Jun Wang, AWS AI Labs; (9) Jiacheng Guo, AWS AI Labs; (10 Liangfu Chen, AWS AI Labs; (11) Parminder Bhatia, GE HealthCare (shaqada lagu qabtay AWS); (12) Ramesh Nallapati, Amazon AGI (shaqada lagu qabtay AWS); (13) Sudipta Sengupta, AWS AI Labs; (14) Bing Xiang, Goldman Sachs (shaqada lagu qabtay AWS).

Shaxda Xiriirinta

Abstract iyo 1 Hordhac

2. Shaqada la xidhiidha

3. Asalkii hore

3.1. Xusuusin iyo 3.2. Tusaalooyinka Qaabka Luuqadda

3.3. Weydiimo badan, madax badan iyo feejignaan guud oo su'aalo badan

4. Mawduuca-ka warhaynta kala qaybsanaanta dareenka iyo 4.1. Dhiirigelin

4.2. Samaynta iyo 4.3. Kakanaanta IO Xusuusta

5. Tijaabooyin

5.1. Isbarbardhigga Awoodaha Madax-badan, Weydiimo-badan, iyo Feejignaan Koox-badan

5.2. Daah-furnaanta Awoodaha-Habab u dhigma

5.3. Codsiyada

6. Gabagabo iyo Tix-raac





A. FAQs

B. Shaqada la xidhiidha

C. Dejinta

D. Kooxda-Multi-Group Feejignaanta Qoyska

E. Macnaha-Ogayaasha Feejignaanta Bifurcated

F. Codsiyada: Natiijooyinka Dheeraadka ah

G. Waafaqsanaanta Goynta Malaha iyo Farsamooyinka Go'aaminta Degdega ah

Abstract

Daraasaddeenna, waxaan ku soo bandhigaynaa feejignaan kala duwan , oo ah hab loo sameeyay tusmaynta tusaalaha luqadda ee macnaha guud ee muunadeynta dufcaddii hal-saldhig. Habkani wuxuu ujeedadiisu tahay in la dhimo xusuusta soo noqnoqda ee IO, oo ah arrin muhiim ah oo daahitaanka cabbirrada dufcadda sare iyo dhererka macnaha guud. Fiiro gaar ah ayaa tan lagu gaaraa iyada oo loo qaybiyo habka dareenka inta lagu jiro dejinta kordhinta laba hawlgal oo GEMM oo kala duwan, iyada oo diiradda la saarayo kaydka KV ee hordhaca ah iyo habka dejinta. Habkani wuxuu hubinayaa xisaabinta saxda ah wuxuuna ilaalinayaa culeyska caadiga ah ee xisaabinta (FLOPs) ee hababka dareenka caadiga ah, laakiin leh xusuusta IO oo hoos u dhacday. Fiiro gaar ah la siiyay ayaa sidoo kale la jaan qaadaya habka dareenka su'aalaha badan ee loo yaqaan xusuusta IO ee kaydinta KV, taas oo awood u siinaysa cabbirka dufcada sare iyo dhererka macnaha guud. Waxtarka natiijadu waxay horseedaa daahitaan hoose, hagaajinta ku habboonaanta codsiyada waqtiga-dhabta ah, tusaale ahaan, awood u siinaysa jiil jawaabeed barbar socda iyada oo aan si wayn loo kordhin daahitaanka, kor u qaadista waxqabadka marka lagu daro farsamooyinka dib u habeynta sida dib u habeynta.

1. Hordhac

Imaanshaha moodooyinka luqadaha waaweyn (LLMs) waxay soo kordhisay casri cusub oo barashada mashiinka, oo soo bandhigay waxqabad cajiib ah oo ku saabsan hawlo kala duwan (Brown et al., 2020; OpenAI, 2023; Chowdhery et al., 2022; Touvron et al., 2023; Che20man et al.1. 2022; Li et al., 2022; Microsoft, 2022; In kasta oo ay awoodaan cajiib ah, dirida moodooyinkan baaxadda leh ee codsiyada la taaban karo waxay keenayaan caqabado waaweyn, gaar ahaan marka la eego daahitaanka iyo hufnaanta. Kobcinta dhinacyadan waa mid muhiim ah, maadaama ay si toos ah u saameeyaan agabyada xisaabinta ee looga baahan yahay inay abuuraan saadaalin oo ay awood u yeeshaan hirgelinta dhabta ah ee moodooyinkan horumarsan ee warshadaha kala duwan.





Xaalad dareen-bixineed oo gaar ah oo u baahan waa muunadayn dufcad-sax ah, halkaas oo ujeeddadu tahay in la abuuro dhammaystiryo badan oo hal xaalad ah. Hawshan waxa sida caadiga ah lagula kulmaa codsiyo badan sida IDE-ga qalabaynta code-ka kuwaas oo bixiya talooyin badan, ama xaalado kala saraynta jiilal badan looga baahan yahay waxqabadka ugu fican (iyada oo loo marayo qiyaasaha kala saraynta sida suurtogalnimada log, codaynta aqlabiyada, iwm). Kordhinta codeeynta ee muuqaalka muunadeynta noocan oo kale ah waa xusuusta IO oo degdeg ah, taas oo noqonaysa cilad daahitaan ah oo loogu talagalay dufcadaha sare iyo dhererka macnaha guud.





Daraasaddan, waxaan ku baaraynaa laba xeeladood oo iswaafaqsan si aan wax uga qabanno caqabadaha xusuusta IO ee ku-meel-gaarka ah: (1) baarista su'aalo badan iyo ganacsigeeda, iyo (2) farsamo cusub oo loo yaqaan 'content-ware'.





Baadhitaankeenu wuxuu ku bilaabmayaa falanqaynta dareenka su'aalaha badan ee guud (Ainslie et al., 2023), oo ay ku jiraan su'aalo badan (Shazeer, 2019), iyo sidoo kale habka fiiro gaar ah oo madax badan (Vaswani et al., 2017) ee waxqabadka iyo ganacsiga daahitaanka. Natiijooyinkayagu waxay muujinayaan miisaan hawlqabad siman leh oo kordhinaysa cabbirka moodeelka ee qiimaha go'an ee tirada kooxaha g ee su'aalo-badan [1]. Hoos u dhigista g waxay keenaysaa isbeddel kor u kaca oo ah luminta ansaxinta vs gooyooyinka cabbirka cabbirka moodeelka. Xidhiidhka joogtada ah ee udhaxeeya cache-ga, cabbirka qaabka iyo luminta ansaxinta ayaa noo ogolaanaysa in aan ka ganacsano waxtarka faa'iidada leh ee cabbirka qaabka, ie, wuxuu noo suurtageliyaa inaan doorano cadaadis sare oo loogu talagalay kiisaska isticmaalka ee u baahan waxtarka sare, iyada oo weli la mid ah waxqabadka dareenka madax-madax badan iyada oo la magdhabaynayo cabbir weyn oo model ah.





Marka labaad, waxaanu soo bandhigaynaa dareenka kala qaybsanaanta ee xog-ogaal u ah, farsamada kala qaybisa dareenka qoyska guud ee su'aalaha badan ee macnaha guud iyo qaybaha kala saarista inta lagu jiro kordhinta koodhaynta. Kala qaybsanaanta noocan oo kale ah waxay ku lug leedahay tiro isku mid ah oo FLOPs ah waxayna soo saartaa natiijooyin isku mid ah marka loo eego dareenka asalka ah, laakiin waxay si weyn u yareeyn kartaa qiimaha IO ee xusuusta iyo sidaas darteed daahitaanka dufcaddii sare iyo xaaladaha dhererka macnaha guud. Habkani wuxuu u oggolaanayaa abuurista dhammaystiryo badan oo waqti-dhab ah iyada oo aan la gelin kharashyo daahitaan oo dheeraad ah, ama waxay awood u siinaysaa cabbirro badan oo dufcad ah taasoo horseedaysa waxqabadka darajada sare. Tusaale ahaan, CodeGen 16B moodeel madax-madax badan (Nijkamp et al., 2022) oo leh dhererka macnaha guud ee 2k, waxaan awoodnaa inaan kordhino cabbirka dufcada ilaa 128 iyadoo la eegayo cabbirka dufcada oo kaliya 5 iyada oo aan lahayn, taasoo keentay pass@k (Chen et al., 2021) ka kordheysa 59.0% ka gudubka@top ilaa 84.0% 55.2% ilaa 58.1%.





Warqadan waxaa laga heli karaa arxiv ee hoos timaada shatiga CC BY 4.0 DEED.

[1] Qiimaha hoose ee kooxaha dareenka g waxay keenaysaa cadaadis sare ee kiraystayaasha qiimaha muhiimka ah, sida kiiska su'aalaha badan ee g = 1, markaa hagaajinta waxtarka iyo daahitaanka faa'iidada KV ee hoos u dhaca marka loo eego kiis madax-badan oo g = h, tirada dareenka weydiinta madax.