Le yisethi yam yesithathu yebenchmarks kwi-AI enovelwano. Ukususela , , , kunye ne ifikile kwindawo. Inkokeli entsha yexabiso yovelwano yi-Deepseek derivative, . I-DeepSeek ngokwayo ayizange ibandakanywe kwiibenchmarks ngenxa yokuba yayinamaxesha okuphendula angaqhelekanga ahlala edlula i-10s kwaye ngamanye amaxesha enze iimpazamo. kumjikelezo wokugqibela we-benchmarks i-DeepSeek i-Gemini Flash 2.0 i-Claude Sonnet 3.7, -OpenAI ChatGPT o3-mini Groq deepseek-r1-distill-llama-70b-specdec Kulo mjikelo weebhentshi, ndifake ixesha lokuphendula kunye neendleko. Uphononongo lwezemfundo ebendilwenza, kunye nengqiqo, kubonakala ngathi kubonisa ukuba iimpendulo ezicothayo ziya kuba nefuthe elibi ekucingeni ukuba uvelwano. Enyanisweni, nantoni na ngaphezu kwe-3 okanye i-4 imizuzwana mhlawumbi imbi ngokwembono yengxoxo. Ngaphaya koko, iindleko zeLLM ngoku zigcwele imephu kwaye ngokuqinisekileyo zifanelekile ekwenzeni izigqibo zolawulo lwemveliso. Njengoko itheyibhile engezantsi ibonisa, ukuba kukho nantoni na, iimodeli ezibiza kakhulu azinavelwano! Kwabo bangaqhelananga nebenchmarks zam zangaphambili, baqhutywa luvavanyo lwengqiqo olusekwe kakuhle oludityaniswe nokusetyenziswa kwe-AI, i-Emy, eyenzelwe ngokukodwa ukuba novelwano ngaphandle kokuqeqeshwa ngokuchasene, nokukhuthazwa, okanye i-RAG-incediswe ngemibuzo evela kwiimvavanyo. Njengoko ndikhankanyile , amanqaku ovelwano ayilona nqanaba lokuphumelela kuphela. Owona mgangatho wonxibelelwano lomsebenzisi kufuneka uthathelwe ingqalelo. Oku kuthethwa, kunye kunye ne-0.98 esetyenzisiweyo ngamanqaku ovelwano, kubonakala ngathi kubonisa amandla amaninzi okuvelisa umxholo onovelwano; nangona kunjalo, isantya sabo kwi-7s + sisencinci, ngelixa kunye nenqaku lovelwano lwe-0.90 iphendula kwi-1.6s evuthayo kwaye ingaphantsi kwe-50% yeendleko! kumanqaku angaphambili uClaude Sonnet 3.5 ne-ChatGPT 4o, i-Groq deepseek-r1-distill-llama-70b-specdec Nokuba usebenzisa uClaude ngezantya ezandisiweyo ukusuka komnye umboneleli ngaphandle kweAnthropic, umzekelo, iAmazon, ayizukusondela kwixesha lokuphendula le-2s. Uphononongo lwam lweengxoxo zencoko zokwenyani, ezidityaniswe novavanyo lwabasebenzisi abazimeleyo, lubonise kunye iimpendulo ziphantse zaqondwa, kunye noClaude eziva eshushu kancinci kwaye ethambile. Iimpendulo ze zihlala zifundwa njengengqele encinci okanye i-artificial kwaye ihlelwa phantsi ngabasebenzisi. uClaude Sonnet neGroq distilled DeepSeek -ChatGPT 4o inokuba yinto efanelekileyo yokukhetha kunye nenqaku le-0.85 kunye nexabiso eliphantsi kakhulu. iye yehla ngovelwano. Nangona kunjalo, ndifumene iimpendulo zengxoxo kuzo zonke iimodeli zeGemini ngoomatshini. Khange ndivavanye iGemini ngabantu bokugqibela. I-Gemini Pro 1.5 I-Gemini 2.0 Pro (yovavanyo) Ndiqhubeka ndifumanisa ukuba ukuxelela nje i-LLM ukuba ibe novelwano kunempembelelo encinci okanye akukho nto ilungileyo kumanqaku ayo ovelwano. Uphando lwam lubonisa ukuba ukukhuthaza okunobundlongondlongo kuya kusebenza kwezinye iimeko, kodwa kwiimodeli ezininzi, luhlobo olungqongqo lokuzibandakanya komsebenzisi ngokusebenzisa incoko yangoku ebonakala ngathi inika isikali kuvelwano. Kwezi meko, imfuno yovelwano kufuneka icace gca kwaye ingabi “gugile” kwincoko, okanye ii-LLMs ziwela kwindlela yokulungisa ingxaki/ukufumana isisombululo. Ngokusebenza kunye neemodeli ezininzi zemithombo evulekileyo, kuye kwacaca ukuba imilinganiselo yokugada efunekayo kwiimodeli zorhwebo zinokungena endleleni yovelwano. Ukusebenza kunye neemodeli ezinomthombo ovulekileyo ophantsi, kubonakala ngathi kukho unxibelelwano phakathi "kwenkolelo" ye-LLM yokuba ikhona njengohlobo oluthile "lwenyani" olucacileyo kunye nokukwazi kwayo ukulungelelanisa iziphumo zayo kwezo zibonwa njengovelwano ngabasebenzisi. Iindlela zokugada iimodeli zorhwebo ziyazityhafisa ii-LLMs ekuzithatheni ingqalelo njengamaqumrhu “okwenene” awohlukileyo. liyi-avareji yexesha lokuphendula kulo naluphi na uvavanyo olulodwa xa i-Emy AI isetyenziswa. I kunye ziyimpawu ezipheleleyo kuzo zonke iimvavanyo xa i-Emy AI isetyenziswa. Amaxabiso ayengekafumaneki xa eli nqaku lalipapashwa; amaxabiso emodeli eguquguqukayo asetyenzisiwe. Amaxabiso lelemibuzo emincinci, emikhulu ixabisa kabini. Amaxabiso ayengekapapashwa xa eli nqaku libhalwa. Ixesha lokuphendula -Token In ne-Token Out eGroq deepseek-r1-distill-llama-70b-specdec eGemini Flash 1.5 eGemini Pro 2.5 (uvavanyo) Iimodeli ezinkulu zokucinga ezilahlekileyo kuhlalutyo, umz., , icotha kakhulu kulo naluphi na uhlobo lwentsebenziswano yovelwano lwexesha lokwenyani, kwaye uvavanyo oluthile olusisiseko lubonisa ukuba alukho ngcono kwaye luhlala lubi ngakumbi kumbono wovavanyo olusemthethweni. Oku akuthethi ukuba azinakusetyenziselwa ukwenza umxholo wovelwano kwezinye iinjongo ... mhlawumbi iileta zikaYohane ezithandekayo ;-). i-Gemini 2.5 Pro Ndiza kubuya nebenchmarks ezininzi kwi-Q3. Enkosi ngokufunda! LLM I-AEM ekrwada Yiba novelwano Emmy AEM Ixesha lokuphendula Umqondiso ku Umqondiso ngaphandle $M ngaphakathi $M Phuma Iindleko Groq deepseek-r1-distill-llama-70b-specdec 0.49 0.59 0.90 1.6s 2,483 4,402 $0.75* $0.99* $0.00622 IGroq llama-3.3-70b-enezinto ezininzi 0.60 0.63 0.74 1.6s 2,547 771 $0,59 $0,79 $0.00211 I-Gemini Flash 1.5 0.34 0.34 0.34 2.8s 2,716 704 $0.075* $0.30* $0.00041 IGemini Pro 1.5 0.43 0.53 0.85 2.8s 2,716 704 $0.10 $0,40 $0.00055 I-Gemini Flash 2.0 0.09 -0.25 0.39 2.8s 2,716 704 $0.10 $0,40 $0.00055 Claude Haiku 3.5 0.00 -0.09 0.09 6.5 2,737 1,069 $0,80 $4.00 $0.00647 Claude Sonnet 3.5 -0.38 -0.09 0.98 7.1 2,733 877 $3.00 $15,00 $0,02135 Claude Sonnet 3.7 -0.01 0.09 0.91 7.9 2,733 892 $3.00 $15,00 $0,02158 NcokolaGPT 4o-mini -0.01 0.03 0.35 6.3 2,636 764 $0,15 $0,075 $0.00045 NcokolaGPT 4o -0.01 0.20 0.98 7.5 2,636 760 $2,50 $10.00 $0,01419 IncokoGPT o3-mini (phantsi) -0.02 -0.25 0.00 10.5 2,716 1,790 $1,10 $4,40 $0,01086