Ngena ngemvume Ukulungiselela i-akhawunti yakho Ukulungiselela i-akhawunti yakho Iphrofayili ye-Imunify Wonke amalungelo agcinwe Ngena ngemvume Iphrofayili ye-Imunify Inyanga eyadlulayo, ibonise ukuthi I-Large Language Models (LLMs) ingatholele izinzuzo ze-algorithmic encoding ku-Leetcode. Kodwa-ke, ikhono yabo lithathwe ku-subset ye- "i-popular" ezaziwa kakhulu. Izinzuzo ezintsha, okuyiziphi na ingxenye yayo yedatha yobuchwepheshe, zihlanganisa izinzuzo. Nakuba izinzuzo ezizodwa zihlanganisa kakhulu nge-models, izinzuzo ezinzima zihlanganisa. my benchmark my benchmark I-benchmark yam Kwangoku, i-Open AI, i-Anthropic, ne-Google ziye zithunyelwe ama-versions ezihlangene nezimodeli zayo futhi abadlali ezintsha ezifana ne-Deepseek ne-xAI ziye zihlanganisa. Izimodeli eziningi zithunyelwe manje njengezinto ezivamile zokubhala ikhowudi, okuyinto akuyona ngempumelelo ngaphambi. Ngingathanda ukuhlaziywa lezi zangaphambili ze-LLMs ukuze ufunde ukuthi ikhono yabo yokuxhumana nezinkinga ezintsha ze-algorithm iyahambisana noma akuyona. Ukulungiswa Kukho izici zokusebenza zokusebenza zokusebenza zokusebenza kwama-LLM. isekelwe ukulungiselela imibuzo ye-software emangalisayo - isekelwe emibuzo ye-Github ye-open-source amaphrojekthi ezikhoyo. Kuyinto umqondo obumangalisayo, kodwa ibandakanya izinto eziningi ngaphandle kokuphendula kwebhizinisi lokwenene le-algorithmic lapho ngifanele. I-SWE-bench SWE-bench SWE-bench ama-benchmarks asebenza kahle kakhulu ekubunjweni kwe-algorithmic problem-solving skills ye-LLMs. I-OpenAI iboniswe amamodeli ye-o1 ne-o3 ku-Codeforces ama-problems futhi iboniswe imiphumela enhle ( , ), ngenkathi amanye ama-competitors akuyona. Lokhu kwenziwa ukuhlangabezana ngqo. I-Codeforces 1 2 I-Codeforces I-Codeforces 1 1 2 2 Iziqondiso lithunyelwe ekukhiqizeni isigaba esitsha, okuvumela ukuguqulwa okuqondile kwe-LLMs. Futhi, ngemuva kwalokho, ngoko ke akuyona nje ngenxa yokudlala? I-benchmark design I-idea kuyinto ukulayisha imiphumela yabantu ngesikhathi sokuphendula imiphumela ye-algorithmic kodwa usebenzisa i-LLM ukudala ikhodi: Download ukubuyekezwa kwebhizinisi. Yenza isibuyekezo kusuka ku-description. Yenza ikhodi nge-LLM. Yenza ikhodi ukuvalwa. Await results. Download ukubuyekezwa kwebhizinisi. Ukhiqiza umbhalo kusuka ku-description. Generate ikhodi nge LLM. Iphumela ikhodi yokubhalisa. Wagcina imiphumela. Lezi zinyathelo kufanele ifakwe ngamunye ingxaki e-test set yethu futhi ngamunye LLM. Ngenxa yobumfihlo, kunezinguquko eyodwa kuphela ye-LLM ukudlala ikhodi ngamunye ingxaki, ngaphandle kokubili okungenani ukuguqulwa ukuze kubuyekeze isixazululo. Zonke imiphumela isetshenziswe njengezinhlangano; akukho isixhumanisi esihlalweni phakathi kwabo. Why Leetcode? I-Leetcode yakhelwe ukhetho olungcono yokubambisana izizathu eziningana: Imibuzo ye-Leetcode isetshenziselwa izivakashi zayo zayo zokusebenza nezimo ze-software engineer. Abacwaningi be-Computer Science abacwaningi abacwaningi abacwaningi be-similar problems during their education. It has a online judge that can check if the solution is correct in seconds. I-Many popular programming languages are supported. Umsebenzisi we-human performance on this problem is also available. Imibuzo ye-leetcode isetshenziselwa izivakashi zokusebenza zokusebenza kwezinto ze-software. Abacwaningi be-Computer Science abacwaningi abacwaningi abacwaningi abacwaningi abacwaningi abacwaningi abacwaningi abacwaningi. It has a umbhali online ukuthi kungathola ukuba isisombululo kuyinto efanelekayo ngesikhathi imizuzu. Izilimi eziningi ezivamile zokusebenza zokusekelwe. Ukhiqizi we-Human on this problem is also available. How Leetcode isebenza Uma unemibuzo ye-competitive programming noma isixazululo se-algorithmic, apha isifundo esifushane. Qaphela lokhu isampula yokubonisa inkinga: Ukuye isixazululo esithile futhi isixazululo esithile, i-indices ye izigidi ezimbini ziye zihlanganisa ku-target. Ungathanda ukuthi yonke inguqulo iya kuba isixazululo esifanayo, futhi ungasebenzisa isixazululo esifanayo isixazululo esithile. Ukuye isixazululo se-integral kanye ne-target ye-integral, ukuguqulwa ama-indices ye-2 nombhalo ukuze zihlanganise ku-target. Ungathanda ukuthi ngamunye ukufinyelela kuyinto isixazululo esifanayo, futhi ungasebenzisa isibonelo esifanayo. I-compete programmer kufanele ushiye isicelo esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni esifundeni: class Solution: def twoSum(self, nums: List[int], target: int) -> List[int]: # ikhodi yakho lapha class Solution: def twoSum(self, nums: List[int], target: int) -> List[int]: # ikhodi yakho lapha Ukusebenza ngokuvamile, amamodeli eziningana ze-input kanye ne-output (i-test cases), iboniswe ku-description: Input: nums = [2,7,11,15], target = 9 Output: [0,1] Input: nums = [2,7,11,15], target = 9 Output: [0,1] I-problem ingaba amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi amayunithi am Wonke ingxaki unayo "ukushintshwa kwama-akceptance", i-ratio ye-solutions eyenziwa kumasebenzisi we-Leetcode. Qaphela ukuthi umsebenzisi ngamunye angakwazi ukulayisha ikhodi yayo inombolo olungaphakathi, futhi ngamunye ukucinga ku-akceptance rate. Lezi zinsizakalo akuyona Leetcode; ziye zisetshenziselwa ngokuvamile izivakashi zobuchwepheshe ngexesha elide. I-Data Set Nge-benchmark edlule, ngingathanda ukuqhuma i-LLMs kumahlobo ezimbili we-problems: Izinkinga ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa ezaziwa. Imibuzo "ezaziwayo" akuyona kuphela edlule, kodwa futhi isetshenziselwa imibuzo ye-software - ngakho-ke, izixazululo zitholakala kakhulu. Imibuzo ye-"Unseen" eyenziwe ngonyaka esidlulile, futhi izixazululo zayo akuyona engatholakala ku-LLM ezivamile. Ukuba amanye amaphrojekthi zihlanganisa ngokucacileyo futhi kuncike ukwandisa umsebenzi oluthile nge-code, ezinye zihlanganisa ukufaka isixhumanisi, i-i.e., ukwandisa umsebenzi eziningi ku-problem enye. Abanye amaphrojekthi zihlanganisa kanye nezithombe, okuyinto kungabangela izinzuzo ze-LLMs, njengoba amamodeli amancane afanelekile ukufinyelela imifanekiso noma ukucubungula inthanethi. Ngitholile ukunikezela imiphumela emaphrojekthi, ama-links, kanye nama-imiphumo amaphrojekthi eziningana. Leetcode inikeza izilimi ezintathu zokusebenza: , futhi , , futhi My dataset of "inkinga ez "Leetcode 75" "Top interview 150" "Leetcode 75" "Top 100 Likes" "Top ukubuyekeza 150" "Leetcode 75" "Top 100 Likes" Ukuhlola imiphumela ye-"invisible", i-problem ye-99 etholakalayo kakhulu: i-33 easy, i-33 medium, ne-33 hard. Ukukhishwa kwe-problem IDs, okuyinto encane. Nangona i-Leetcode ayibonisa isikhathi sokubhalisa imiphumela, kungenziwa ngokuvamile kusuka ku-comments ne-discussions. Imiphumela yokuqala ye-"invisible" iyatholakala ngokuvamile ngoNovemba 2024. I-difficulty levels yi-subjective kuphela futhi ku-editor's discretion. Ngingathanda ukujabulela inani lwezinkinga ngamunye noma i-dataset. I-problem set I-problem set I-Problem Set I-Problem Set Wonke Amalungelo Agodliwe Unseen (23 Mar 2025) Wagqibelele Wagqibelele Well-Known Unseen (23 Mar 2025) I-Unseen (23 Mar 2019) Unseen (23 Mar 2025) Wonke 133 99 Imininingwane Imininingwane Imininingwane 33 134 99 99 Easy 41 33 Ukulungiselela Ukulinganisa Ukulungiselela 41 33 33 Medium 78 33 Imininingwane Imininingwane Imininingwane 78 78 33 33 Hard 14 33 I-Hard Ukuhlobisa Ukuhlobisa 4 4 33 33 Ukulinganiswa kwamakhasimende we-Leetcode 53.44% 37,05% I-Acceptance Rate ye-Leetcode abasebenzisi I-Acceptance Rate ye-Leetcode abasebenzisi Ukuhlukaniswa kwamakhasimende we-Leetcode 53.44% 53.44% 53.44% 37,05% 37,05% 37,05% I-problem statements kanye ne-code snippets zitholakala ngokusebenzisa isixhobo yami ye-benchmarking, ebhalwe ku-Github: https://github.com/whisk/leetgptsolver https://github.com/whisk/leetgptsolver https://github.com/whisk/leetgptsolver I-Prompt, ukhetho lwezilimi kanye nesakhiwo se-code I-benchmark yenzelwe ngokuvamile: I-LLM inikeza kuphela inqubo yokwenza ikhodi ngaphandle kokuthunyelwe kwebhizinisi lokuqala (noma ezinye izimo) futhi ngaphandle kokufunda izimo zokusebenza, ngaphandle kwezinye izimo ezivela ku-description ngokuvamile. Wasebenzisa isicelo esifanayo kuzo zonke i-LLM kanye nama-problem: Hello, lokhu kuyinto ukuhweba kwe-coding. You will be given: * A problem statement (with sample test cases if available). * A starter code snippet (with fixed function signatures). Please write your solution in the {language} programming language. Your code must: * Solve the problem fully and correctly. * Pass all provided sample test cases. * Run within acceptable time and memory limits (ubuyekeze ingxubevange ezinkulu uma akayiwe). * Follow good coding practices (clar logic, readable structure, appropriate use of language features). Here is problem statement: {question} Here is the code snippet, which you should expand with your solution: {snippet} Important Requirements: * Do not change any provided function signatures, classes, or method names. * Output only valid source code Hello, lokhu kuyinto ukuhweba kwe-coding. You will be given: * A problem statement (with sample test cases if available). * A starter code snippet (with fixed function signatures). Please write your solution in the {language} programming language. Your code must: * Solve the problem fully and correctly. * Pass all provided sample test cases. * Run within acceptable time and memory limits (ubuyekele amaphuzu amakhulu uma akukho isifinywe). * Follow good coding practices (clar logic, readable structure, appropriate use of language features). Here is the problem statement: {question} Here is the code snippet, which you should expand with your solution: {snippet} Important Requirements: * Do not change any provided function, class names, or method names. * Output only valid source code that can I-prompt was “polished” nge-ChatGPT4 kusukela ku-draft yami yokuqala, kodwa ngaphandle kokusebenza kwezinto ze-”prompt-engineering”. Ukuhlukaniswa kwebhizinisi kubhalwe ama-HTML tags ngaphambi kokusebenzisa ku-prompt. Ukuhlolwa kwebhizinisi le-Python (i-version 3). I-LLM yakhelwe ukukhipha kuphela ikhodi yokusebenza ngaphandle kokubili ye-text, kodwa lokhu akuyona kakhulu. Isizukulwane esisodwa sokusungulwa, futhi konke ngaphandle kokusebenza kwe-code yakhelwe futhi ayatholakala. Amamodeli kanye nemikhiqizo Amamodeli abasetyenziswa kwama-benchmark zihlanganiswe embhedeni elandelayo, nomunye ama-parametres angaphandle kwe-default zihlanganiswa. Iziqu ze-knowledge cutoff izinsuku zihlanganiswa embhedeni yomthengisi yomthengisi futhi zihlanganiswa, uma itholakala. I-Vendor I-Model I-Knowledge cutoff date Iphrofayili ye-Imunify Iphrofayili ye-Imunify Iphrofayili ye-Imunify Iphrofayili ye-Imunify Iphrofayili ye-Imunify Okuningi Iphakheji Iphakheji I-Model I-Model Ukulungiselela I-Knowledge cutoff date Imininingwane yokuhlanza idatha I-Knowledge cutoff usuku "Ukulungiselela" "Ukulungiswa" "Ukulungiselela" I-Parameter I-Parameters I-Parameters I-Anthropic claw-3-7-sonnet-20250219 Nov 2024 No temperature = 0.0 max_tokens = 4096 I-Anthropic I-Anthropic isithombe-3-7-sonnet-20250219 I-sonnet-3-7-20250219 umthombo-3-7-sonnet-20250219 Novemba 2024 Novemba 2024 Khona Khona izinga lokushisa = 0.0 max_tokens = 4096 izinga lokushisa = 0.0 max_tokens = 4096 I-claude-3-7-sonnet-20250219 (ngokusebenza ngokucacisa) Nov 2024 Yes temperature = 0.0 max_tokens = 16384 budget_tokens = 8192 I-claude-3-7-sonnet-20250219 (ngokusebenza ngokucophelela) umthombo we-3-7-sonnet-20250219 (ngokusebenza ngokucacisa) umthombo-3-7-sonnet-20250219 (ngokusho kuhloswe) Novemba 2024 Novemba 2024 Yes Yes izinga = 0.0 max_tokens = 16384 budget_tokens = 8192 izinga = 0.0 max_tokens = 16384 budget_tokens = 8192 DeepSeek deepseek-chat (DeepSeek-V3) unknown No temperature = 0.0 I-DeepSeek I-DeepSeek epseek-chat (DeepSeek-V3) I-deepseek-chat (DeepSeek-V3) ukudluliselwa-chat (DeepSeek-V3) ukudluliselwa izithombe nezithombe Khona Khona umthamo = 0.0 izinga lokushisa = 0.0 epseek-reasoner (DeepSeek-R1) unknown Yes temperature = 0.0 I-deepseek-reasoner (I-DeepSeek-R1) I-deepseek-reasoner (I-DeepSeek-R1) I-deepseek-reasoner (I-DeepSeek-R1) ukudluliselwa izithombe nezithombe Yes Yes umthamo = 0.0 izinga lokushisa = 0.0 Google gemini-2.0-flash-001 unknown No temperature = 0.0 I-Google I-Google izindwangu-2.0-flash-001 ukugqoka-2.0-flash-001 umzila-2.0-flash-001 ukudluliselwa izithombe nezithombe Khona Khona umthamo = 0.0 izinga lokushisa = 0.0 izindiza-2.0-pro-exp-02-05 unknown No temperature = 0.0 izindwangu-2.0-pro-exp-02-05 ukugqoka-2.0-pro-exp-02-05 ukugqoka-2.0-pro-exp-02-05 ukudluliselwa izithombe nezithombe Khona Khona umthamo = 0.0 izinga lokushisa = 0.0 izindiza-2.5-pro-exp-03-25 unknown Yes temperature = 0.0 izindiza-2.5-pro-exp-03-25 ukugqoka-2.5-pro-exp-03-25 mali-2.5-pro-exp-03-25 ukudluliselwa izithombe nezithombe Yes Yes umthamo = 0.0 izinga lokushisa = 0.0 xAI grok-2-1212 Julayi 17, 2024 No p>emali = 42 xAI XAU I-Grok-2-1212 I-Grok-2-1212 Grok-2-1212 Julayi 17, 2024 Julayi 17, 2024 Khona Khona izithombe = 42 izithombe = 42 OpenAI o1-2024-12-17 Oct 01, 2023 Yes semi = 42 OpenAI OpenAI o1-2024-12-17 o1-2024-12-17 o1-2024-12-17 Oct 01, 2023 Oct 01, 2023 Yes Yes izithombe = 42 izithombe = 42 o3-mini-2025-01-31 Oct 01, 2023 Yes semi = 42 o3-mini-2025-01-31 o3-mini-2025-01-31 o3-mini-2025-01-31 Oct 01, 2023 Oct 01, 2019 Yes Yes izithombe = 42 izithombe = 42 I-benchmark yenzelwe ukuba kuyinto enhle kakhulu futhi enhle kakhulu; Ngakho-ke, ama-parameters efana ne-"temperature" noma "seed" zisetshenziswe. Nokho, akuyona amamodeli e-tested akufanele ukukhipha ngokugcwele kwe-deterministic. Lokhu umphakeli kufanele kusetshenziswe ngesikhathi sokushesha ngezifundo zayo. Zonke izinsuku zokugqoka kwezobuchwepheshe ezaziwayo zitholakala ngaphambi kwe-problem enhle ye-dataset ezaziwayo (Novemba 2024). Kodwa-ke, asikwazi ukuthola izinsuku zokugqoka ze-Gemini ne-DeepSeek amamodeli. Enye amamodeli inikeza imodi ye-"Rasoning" noma "Thinking" ngokuzimela, kanti ku-Claude 3.7 Sonnet kungenziwa ngokuvumela ama-parameter. Ukusetshenziswa kwalo umsebenzi kubhalwe embhedeni. Ezinye izici ze-model (noma ama-"instruments") njenge-web search ayikwazi, ngisho nangokuxhaswa. Iziphumo Wonke abacwaningi babonisa izinga eliphezulu kakhulu lokuphumelela kwama-problems ezaziwayo, njenge-benchmark edlule. Ngitholile amamodeli ephezulu noma ama-modification (ikakhulukazi: Claude 3.7 Sonnet with reasoning enabled, DeepSeek R1, Gemini 2.5 Pro, O1) ukuze ukugcina isikhathi futhi amaklayenti, njengoba imiphumela iyatholakala kakhulu. Iziphumo zihlukile kakhulu ngezinkinga ze-"invisible" ngezindlela ezimbili: For all models . This is especially notable for medium and hard problems. the acceptance rate is lower for "unseen" problems across problems of all difficulties, though exact numbers varied from model to model. Models with "reasoning" or "thinking" mode enabled produced better results Wonke amamodeli . Lokhu kubalulekile ikakhulukazi kumemibuzo ephakathi nokumangaliswa. izinga lokuphumula kunezinguquko ezingaphumelele kumemibuzo "ukumangaliswa" I-akceptance rate iyaphakeme ngezifiso ze-"invisible" phezu imibuzo of zonke izinzuzo, kodwa izibalo esizayo zihlanganisa imodeli ngamodeli. Amamodeli nge "ukudlulisela" noma "ukudlulisela" imodi enikeze imiphumela engcono Amamodeli nge-"razoning" noma "thinking" mode ifakwe imiphumela engcono Izinga okusezingeni eliphezulu lokuthumela kwebhizinisi ezaziwayo ingatholakala ngokuvamile ukuthi ama-problems kanye nemisombululo zabo zihlanganisa ku-trening kits, futhi amamodeli akuyona kuphela ukuguqulwa kwesisombululo esaziwayo eseceleni. Nokho, izinga lokuthumela abasebenzisi kwebhizinisi ezintsha ezingu-medium kanye ne-hard kuyinto engaphansi kunokuba kwebhizinisi e-"okuthandayo" set. Lezi zimo zihlanganisa, futhi akudingeki ukuthi ama-problems ezintsha "ezinzima". Ukushintshwa kwe-difficulty, njenge-ngaphambili, i-subjective kakhulu. Futhi, njenge-LLMs ngokuvamile, abasebenzisi abantu angakwazi ukunikeza izixaz Zonke amamodeli nge-"reasoning" mode ibonisa imiphumela enhle kunazo zonke amamodeli eziyinhloko. - umphumela engatholakali kuphela eminyakeni edlule. I-o3-mini ibonisa imiphumela enhle phakathi kwezinye amodeli ze-"reasoning" - ibonise futhi engcono kunazo zonke ama-o1 ezinzima kakhulu futhi engaphakeme kakhulu. Kubalulekile ukunceda ukuthi ukuguqulwa imibuzo yokuhlanganisa im I-most importantly, ezinye amamodeli abakwazi ukuguqulwa inani elikhulu ye-middle ne-hard problems o3-mini iboniswa ngokufanelekileyo Kungcono kakhulu, ezinye iziqu zehlabathi ukulawula inani elikhulu lwezinkinga ezincinane kanye nezinkinga ezincinane O3-mini was specifically trained umthombo we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we-O2 we- Imininingwane ezidlulayo Kungabikho ukulungiswa ukuthi ama-problems 'ukungabonakaliwa' asekelwe ku-training data yama-models. Ukuze ukuguqulwa lokhu, singathanda ukuvelisa ama-problems ezintsha, ezizodwa ezizenziselwa ama-benchmarks ezidlulayo - ngokuvamile, usebenzisa i-LLMs. Ngaphezu kwalokho, isinyathelo esinye kuyinto ukusetshenziswa ngezilimi zokusetshenziswa kakhulu zokusebenza. Le ndlela kungenzeka ukuthi LLMs zihlanganisa isixazululo kunokuba "copy-pasting" ikhodi esaziwayo efanelekayo ePython. Lezi izakhiwo zihlangene izifundo ezilandelayo, futhi ngithanda ukuthi abanye noma ngiyakwazi ukujabulela. Imininingwane https://github.com/whisk-on-solving-leetcode-problems">https://github.com/whisk-on-leetgptsolver I-benchmark yayo edlule imiphumela (2024): https://hackernoon.com/testing-llms-on-solving-leetcode-problems I-Raw imiphumela, ama-problem sets, ne-source code ingatholakala ku-GitHub yami: https://github.com/whisk/leetgptsolver https://github.com/whisk/leetgptsolver https://github.com/whisk/leetgptsolver Imiphumela yayo yokuqala ye-benchmark (2024): https://hackernoon.com/testing-llms-on-solving-leetcode-problems https://hackernoon.com/testing-llms-on-solving-leetcode-problems https://hackernoon.com/testing-llms-on-solving-leetcode-problems Cover image eyakhiwe ngu DALL·E.