Forskare bygger GPU-motor som simulerar hjärnceller 1 500 gånger snabbare

Författare : Yichen Zhang Han Han Låt mig Xiaofei Liu Författare: J. J. Johannes Hjorth Alexander Kozlov Yutao han Shenjiang Zhang Författare: Jeanette Hellgren Kotaleski Yonghong Tian Sten Grillare När du Tejun Huang Författare : Yichen Zhang Han Han Låt mig Xiaofei Liu Författare: J. J. Johannes Hjorth Alexander Kozlov Yutao han Shenjiang Zhang Författare: Jeanette Hellgren Kotaleski Yonghong Tian Sten Grillare När du Tejun Huang Abstrakt Biophysically detaljerade multi-division modeller är kraftfulla verktyg för att utforska beräkningsprinciper i hjärnan och fungerar också som en teoretisk ram för att generera algoritmer för artificiell intelligens (AI) system. Dock begränsar den dyra beräkningskostnaden kraftigt applikationerna inom både neurovetenskap och AI-områden. Den största flaskhalsen under simulering av detaljerade avdelningsmodeller är en simulator förmåga att lösa stora system av linjära ekvationer. Endritiska hierarkiska cheduling (DHS) -metoden för att avsevärt påskynda en sådan process. Vi visar teoretiskt att DHS-implementeringen är beräkningsoptimalt och korrekt. Denna GPU-baserade metod utför med 2-3 order av storlek högre hastighet än den klassiska seriella Hines-metoden i den konventionella CPU-plattformen. Vi bygger en DeepDendrite-ramverk, som integrerar DHS-metoden och GPU-beräkningsmotorn i NEURON-simulatorn och demonstrerar DeepDendrites applikationer i neurovetenskapliga uppgifter. Vi undersöker hur rumsliga mönster av ryggradsinmatningar påverkar neuronal excitabilitet i en detaljerad mänsklig pyramidneuronmodell med 25 000 ryggrader. Dessutom ger vi en kort diskussion om DeepDendrites potential för AI, D H S Introduktion Beskrivning av kodning och beräkningsprinciper för neuroner är avgörande för neurovetenskap. däggdjurs hjärnor består av mer än tusentals olika typer av neuroner med unika morfologiska och biofysiska egenskaper. , där neuroner betraktades som enkla sammanfattningsenheter, tillämpas fortfarande i stor utsträckning i neuralberäkning, särskilt i analys av neurala nätverk. Under de senaste åren har modern artificiell intelligens (AI) utnyttjat denna princip och utvecklat kraftfulla verktyg, såsom artificiella neurala nätverk (ANN). Men förutom omfattande beräkningar på enstaka neuronnivå, subcellulära avdelningar, såsom neuronala dendrit, kan också utföra icke-lineära operationer som oberoende beräkningsenheter. , , , , Dessutom kan dendritiska ryggraden, små utsprång som täcker dendriterna tätt i ryggradsneuroner, compartmentalisera synaptiska signaler, så att de kan separeras från sina föräldrars dendriter ex vivo och in vivo. , , , . 1 2 3 4 5 6 7 8 9 10 11 Simuleringar med hjälp av biologiskt detaljerade neuroner ger en teoretisk ram för att länka biologiska detaljer till beräkningsprinciper. , tillåter oss att modellera neuroner med realistiska dendritiska morfologier, inre jonledningsförmåga och extrinsiska synaptiska ingångar. , som modellerar de biofysiska membranegenskaperna hos dendrit som passiva kablar, vilket ger en matematisk beskrivning av hur elektroniska signaler invaderar och sprider sig genom komplexa neuronala processer. Genom att införliva kabelteori med aktiva biofysiska mekanismer som jonkanaler, excitatoriska och hämmande synaptiska strömmar etc., kan en detaljerad multi-division modell uppnå cellulära och subcellulära neuronala beräkningar bortom experimentella begränsningar , . 12 13 12 4 7 Förutom dess djupa inverkan på neurovetenskapen har biologiskt detaljerade neuronmodeller nyligen använts för att överbrygga klyftan mellan neuronala strukturella och biofysiska detaljer och AI. Den rådande tekniken inom det moderna AI-fältet är ANNs bestående av punktneuroner, en analog till biologiska neurala nätverk. Även ANNs med "backpropagation-of-error" (backprop) algoritm uppnådde anmärkningsvärd prestanda i specialiserade applikationer, även slå topp mänskliga professionella spelare i spel av Go och schack , Den mänskliga hjärnan överträffar fortfarande ANNs i domäner som involverar mer dynamiska och bullriga miljöer. , Nyare teoretiska studier tyder på att dendritisk integration är avgörande för att generera effektiva inlärningsalgoritmer som potentiellt överskrider backprop i parallell informationsbehandling. , , Dessutom kan en enda detaljerad multi-division modell lära sig nätverksnivå icke-lineära beräkningar för punkt neuroner genom att justera endast synaptisk styrka. , Därför är det av hög prioritet att utöka paradigmer i hjärnliknande AI från enskilda detaljerade neuronmodeller till storskaliga biologiskt detaljerade nätverk. 14 15 16 17 18 19 20 21 22 En långvarig utmaning med den detaljerade simuleringsmetoden ligger i dess extremt höga beräkningskostnader, vilket har allvarligt begränsat dess tillämpning på neurovetenskap och AI. , , För att förbättra effektiviteten minskar den klassiska Hines-metoden tidskomplexiteten för att lösa ekvationer från O(n3) till O(n), vilket har tillämpats i stor utsträckning som kärnalgoritmen i populära simulatorer som NEURON. och Genesis Men denna metod använder ett seriellt tillvägagångssätt för att bearbeta varje avdelning sekventiellt.När en simulering involverar flera biofysiskt detaljerade dendriter med dendritiska spin, skalar den linjära ekvationsmatrisen ("Hines Matrix") i enlighet därmed med ett ökande antal dendriter eller spin (Fig. ), vilket gör Hines metod inte längre praktisk, eftersom det utgör en mycket tung börda på hela simuleringen. 12 23 24 25 26 1e En rekonstruerad skikt-5 pyramidell neuronmodell och den matematiska formeln som används med detaljerade neuronmodeller. Workflow when numerically simulating detailed neuron models. The equation-solving phase is the bottleneck in the simulation. Exempel på linjära ekvationer i simulering. Databeroende av Hines-metoden vid lösning av linjära ekvationer i och . Antalet linjära ekvationssystem som ska lösas genomgår en betydande ökning när modellerna växer mer detaljerade. Beräkningskostnad (steg som vidtagits i ekvationslösningsfasen) av Hines-metoden på olika typer av neuronmodeller. Illustration of different solving methods. Different parts of a neuron are assigned to multiple processing units in parallel methods (mid, right), shown with different colors. In the serial method (left), all compartments are computed with one unit. Beräkningskostnaden för tre metoder i när man löser ekvationer av en pyramidmodell med spindlar. Körtiden för olika metoder på att lösa ekvationer för 500 pyramidmodeller med spinn. Körtiden indikerar tidsförbrukningen av 1 s simulering (lösning av ekvationen 40 000 gånger med ett tidsskede på 0,025 ms). p-Hines parallell metod i CoreNEURON (på GPU), Branch-baserad gren-baserad parallell metod (på GPU), DHS Dendritisk hierarkisk schemaläggningsmetod (på GPU). a b c d c e f g h g i Under de senaste decennierna har enorma framsteg gjorts för att påskynda Hines-metoden genom att använda parallella metoder på cellnivå, vilket möjliggör parallell beräkning av olika delar i varje cell. , , , , , Men nuvarande parallella metoder på cellnivå saknar ofta en effektiv parallelleringsstrategi eller saknar tillräcklig numerisk noggrannhet jämfört med den ursprungliga Hines-metoden. 27 28 29 30 31 32 Here, we develop a fully automatic, numerically accurate, and optimized simulation tool that can significantly accelerate computation efficiency and reduce computational cost. In addition, this simulation tool can be seamlessly adopted for establishing and testing neural networks with biological details for machine learning and AI applications. Critically, we formulate the parallel computation of the Hines method as a mathematical scheduling problem and generate a Dendritic Hierarchical Scheduling (DHS) method based on combinatorial optimization och parallell datorteori Vi visar att vår algoritm ger optimal schemaläggning utan förlust av precision.Dessutom har vi optimerat DHS för den mest avancerade GPU-chippen för närvarande genom att utnyttja GPU-minnets hierarki och minnesåtkomstmekanismer.Tillsammans kan DHS påskynda beräkningen 60-1500 gånger (Tilläggstabellen) Jämfört med den klassiska simulatorn Neuron samtidigt som samma noggrannhet bevaras. 33 34 1 25 För att möjliggöra detaljerade dendritiska simuleringar för användning i AI etablerar vi därefter DeepDendrite-ramverket genom att integrera den DHS-inbyggda CoreNEURON-plattformen (en optimerad beräkningsmotor för NEURON). as the simulation engine and two auxiliary modules (I/O module and learning module) supporting dendritic learning algorithms during simulations. DeepDendrite runs on the GPU hardware platform, supporting both regular simulation tasks in neuroscience and learning tasks in AI. 35 Sist men inte minst presenterar vi också flera applikationer som använder DeepDendrite, som riktar sig till några kritiska utmaningar inom neurovetenskap och AI: (1) Vi demonstrerar hur rumsliga mönster av dendritiska ryggradsinmatningar påverkar neuronal aktivitet med neuroner som innehåller ryggrader över dendritiska träd (full ryggradsmodeller). DeepDendrite gör det möjligt för oss att utforska neuronal beräkning i en simulerad mänsklig pyramidneuronmodell med ~25 000 dendritiska ryggrader. (2) I diskussionen överväger vi också potentialen hos DeepDendrite i samband med AI, specifikt i skapandet av ANNs med morfologiskt detaljerade mänskliga pyramidneuroner. All källkod för DeepDendrite, full-spin-modellerna och den detaljerade dendritiska nätverksmodellen är offentligt tillgängliga online (se Code Availability).Vår open-source inlärningsram kan enkelt integreras med andra dendritiska inlärningsregler, såsom inlärningsregler för icke-lineära (fullaktiv) dendriter explosionsberoende synaptisk plasticitet , och lära sig med spike förutsägelse Sammantaget ger vår studie en komplett uppsättning verktyg som har potential att förändra det nuvarande ekosystemet för den beräkningsbaserade neurovetenskapen.Genom att utnyttja kraften i GPU-databehandling förutser vi att dessa verktyg kommer att underlätta systemnivåutforskningar av beräkningsprinciperna för hjärnans fina strukturer, samt främja samspelet mellan neurovetenskap och modern AI. 21 20 36 Results Dendritisk hierarkisk schemaläggning (DHS) Beräkning av jonströmmar och lösning av linjära ekvationer är två kritiska faser när man simulerar biofysiskt detaljerade neuroner, som är tidskrävande och utgör en allvarlig beräkningsbörda. Som en följd blir lösningen av linjära ekvationer den återstående flaskhals för parallelleringsprocessen (Fig. och ) 37 1a F För att ta itu med denna flaskhals har parallella metoder på cellulär nivå utvecklats, som påskyndar single-cell beräkning genom att "split" en enda cell i flera avdelningar som kan beräknas parallellt. , , Men sådana metoder förlitar sig starkt på tidigare kunskap för att generera praktiska strategier om hur man delar upp en enda neuron i avdelningar (Fig. Ett tillägg Fig. ). Hence, it becomes less efficient for neurons with asymmetrical morphologies, e.g., pyramidal neurons and Purkinje neurons. 27 28 38 1 g i 1 Vi strävar efter att utveckla en mer effektiv och exakt parallell metod för simulering av biologiskt detaljerade neurala nätverk. För det första fastställer vi kriterierna för noggrannheten hos en cellulär nivå parallell metod. , föreslår vi tre villkor för att säkerställa att en parallell metod kommer att ge identiska lösningar som den seriella beräknings Hines-metoden enligt databeroendet i Hines-metoden (se Metoder). 34 Baserat på simuleringens noggrannhet och beräkningskostnad formulerar vi parallelleringsproblemet som ett matematiskt schemaläggningsproblem (se Metoder). parallella trådar, kan vi beräkna högst noder vid varje steg, men vi behöver se till att en nod beräknas endast om alla dess barnnoder har behandlats; vårt mål är att hitta en strategi med det minsta antalet steg för hela proceduren. k k För att generera en optimal partition föreslår vi en metod som kallas Dendritic Hierarchical Scheduling (DHS) (teoretiskt bevis presenteras i metoderna). DHS-metoden innehåller två steg: analysera dendritisk topologi och hitta den bästa partitionen: (1) Med en detaljerad modell får vi först dess motsvarande beroende träd och beräknar djupet av varje nod (djupet av en nod är antalet av dess förfäder noder) på trädet (Fig. 2) Efter topologianalys söker vi efter kandidaterna och väljer högst djupaste kandidaten noder (en nod är en kandidat endast om alla dess barn noder har behandlats). och ) 2a 2b och c k 2d DHS work flow. DHS processes Den djupaste kandidaten noder varje iteration. Illustration av hur man beräknar noddjupet för en avdelningsmodell.Modellen omvandlas först till en trädstruktur och beräknas därefter djupet för varje nod. Topologianalys på olika neuronmodeller. Sex neuroner med distinkta morfologier visas här. För varje modell väljs soma som trädets rot så att nodens djup ökar från soma (0) till de distala dendriterna. Illustration av att utföra DHS på modellen i med fyra trådar. Kandidater: noder som kan bearbetas. Utvalda kandidater: noder som väljs av DHS, dvs. deepest candidates. Processed nodes: nodes that have been processed before. Paralleliseringsstrategi som erhållits av DHS efter processen i Varje nod tilldelas en av de fyra parallella trådarna.DHS minskar steg av seriell nodbearbetning från 14 till 5 genom att distribuera noder till flera trådar. Relativ kostnad, dvs. andelen beräkningskostnad för DHS till den för seriell Hines-metoden, när DHS tillämpas med olika antal trådar på olika typer av modeller. a k b c d b k e d f Ta en förenklad modell med 15 avdelningar som ett exempel, med hjälp av seriell beräkning Hines-metoden, det tar 14 steg för att bearbeta alla noder, medan med DHS med fyra parallella enheter kan dela sina noder i fem deluppsättningar (Fig. ): {{9,10,12,14}, {1,7,11,13}, {2,3,4,8}, {6}, {5}}. Because nodes in the same subset can be processed in parallel, it takes only five steps to process all nodes using DHS (Fig. ). 2d 2e Next, we apply the DHS method on six representative detailed neuron models (selected from ModelDB ) med olika antal trådar (Fig. ):, including cortical and hippocampal pyramidal neurons , , , cerebellar Purkinje neurons Striatala projektionsneuroner (SPN) ), och olfatoriska glödlampa mitralceller , som täcker de viktigaste huvudneuronerna i sensoriska, kortikala och subkortikala områden. Vi mätte sedan beräkningskostnaden. Den relativa beräkningskostnaden här definieras av andelen beräkningskostnad för DHS till den för seriell Hines-metoden. Den beräkningskostnaden, dvs antalet steg som vidtas för att lösa ekvationer, sjunker dramatiskt med ökande trådnummer. Till exempel med 16 trådar är den beräkningskostnad för DHS 7%-10% jämfört med seriell Hines-metoden. Intressant, når DHS-metoden de lägre gränserna för deras beräkningskostnad för presenterade neuroner när 16 eller till och med 8 parallella trådar ges (Figur. ), vilket tyder på att lägga till fler trådar förbättrar inte prestandan ytterligare på grund av beroenden mellan avdelningar. 39 2f 40 41 42 43 44 45 2f Together, we generate a DHS method that enables automated analysis of the dendritic topology and optimal partition for parallel computing. It is worth noting that DHS finds the optimal partition before the simulation starts, and no extra computation is needed to solve equations. Förbättra DHS-hastigheten med GPU-minne boosting DHS beräknar varje neuron med flera trådar, som förbrukar en stor mängd trådar när man kör neurala nätverkssimuleringar. För parallell datorer . In theory, many SPs on the GPU should support efficient simulation for large-scale neural networks (Fig. Vi observerade dock konsekvent att effektiviteten hos DHS minskade signifikant när nätverksstorleken växte, vilket kan bero på spridd datalagring eller extra minnesåtkomst orsakad av att ladda och skriva mellanliggande resultat (Fig. , left). 3a, b 46 3c 3d GPU-arkitektur och dess minneshierarki. Varje GPU innehåller massiva bearbetningsenheter (strömprocessorer). Architecture of Streaming Multiprocessors (SMs). Varje SM innehåller flera streamingprocessorer, register och L1-cache. Applicera DHS på två neuroner, var och en med fyra trådar. Memory optimization strategy on GPU. Top panel, thread assignment and data storage of DHS, before (left) and after (right) memory boosting. Bottom, ett exempel på ett enda steg i triangularization när man simulerar två neuroner i . Processors send a data request to load data for each thread from global memory. Without memory boosting (left), it takes seven transactions to load all request data and some extra transactions for intermediate results. With memory boosting (right), it takes only two transactions to load all request data, registers are used for intermediate results, which further improve memory throughput. Körtid av DHS (32 trådar per cell) med och utan minne boost på flera lager 5 pyramidmodeller med spinn. Acceleration av minne boost på flera lager 5 pyramidala modeller med spinn. minne boost ger 1,6-2 gånger acceleration. a b c d d e f Vi löser detta problem genom GPU-minne boosting, en metod för att öka minnet genom att dra nytta av GPU: s minneshierarki och åtkomstmekanism. Baserat på minnet laddningsmekanismen för GPU, leder successiva trådar ladda anpassade och successivt lagrade data till en hög minnet genomströmning jämfört med åtkomst till scatter-lagrade data, vilket minskar minnet genomströmning. , För att uppnå hög genomströmning anpassar vi först de beräkningsordrar av noder och omarrangerar trådarna enligt antalet noder på dem. Sedan permuta vi data lagring i globalt minne, konsekvent med beräkningsordrar, det vill säga noder som bearbetas i samma steg lagras successivt i globalt minne. Dessutom använder vi GPU register för att lagra mellanliggande resultat, ytterligare stärka minnet genomströmning. Exemplet visar att minne boost tar bara två minnetransaktioner för att ladda åtta förfrågningsdata (Fig. Dessutom experiment på flera nummer av pyramidala neuroner med ryggrader och de typiska neuronmodellerna (Fig. Ett tillägg Fig. ) visar att minnesökning uppnår en 1,2-3,8-faldig acceleration jämfört med den naiva DHS. 46 47 3d 3e och f 2 To comprehensively test the performance of DHS with GPU memory boosting, we select six typical neuron models and evaluate the run time of solving cable equations on massive numbers of each model (Fig. Vi undersökte DHS med fyra trådar (DHS-4) och sexton trådar (DHS-16) för varje neuron, respektive. Jämfört med GPU-metoden i CoreNEURON, DHS-4 och DHS-16 kan accelerera cirka 5 och 15 gånger, respektive (Fig. Dessutom, jämfört med den konventionella seriella Hines-metoden i NEURON som körs med en enda tråd av CPU, DHS accelererar simuleringen med 2-3 order av storlek (Tilläggsfigur. ), while retaining the identical numerical accuracy in the presence of dense spines (Supplementary Figs. and ), aktiva dendriter (Tilläggsfigur. ) och olika segmenteringsstrategier (Tilläggsfigur. och ) 4 4a 3 4 8 7 7 Körtid för att lösa ekvationer för en 1 s simulering på GPU (dt = 0,025 ms, 40 000 iterationer totalt). CoreNEURON: den parallella metoden som används i CoreNEURON; DHS-4: DHS med fyra trådar för varje neuron; DHS-16: DHS med 16 trådar för varje neuron. , Visualisering av partitionen av DHS-4 och DHS-16, varje färg indikerar en enda tråd. a b c DHS creates cell-type-specific optimal partitioning För att få insikter i DHS-metodens arbetsmekanism visualiserade vi partitioneringsprocessen genom att kartlägga avdelningar till varje tråd (varje färg presenterar en enda tråd i figuren. Visualiseringen visar att en enda tråd ofta växlar mellan olika grenar (Fig. Intressant, DHS genererar anpassade partitioner i morfologiskt symmetriska neuroner såsom striatala projektionsneuron (SPN) och mitralcellen (Fig. Däremot genererar den fragmenterade partitioner av morfologiskt asymmetriska neuroner som de pyramidala neuronerna och Purkinje-cellen (Fig. ), vilket indikerar att DHS delar upp det neurala trädet på individuell skala (dvs. trädnod) snarare än grenskala.Denna cell-typ-specifika finkornade partition gör det möjligt för DHS att helt utnyttja alla tillgängliga trådar. 4B och C 4B och C 4b, c 4b, c Sammanfattningsvis genererar DHS och minnesförbättring en teoretiskt beprövad optimal lösning för att lösa linjära ekvationer parallellt med oöverträffad effektivitet. Med hjälp av denna princip har vi byggt den öppna DeepDendrite-plattformen, som kan användas av neurologer för att implementera modeller utan någon specifik GPU-programmeringskunskap. Nedan visar vi hur vi kan använda DeepDendrite i neurovetenskapliga uppgifter. DHS möjliggör spinal nivå modellering Eftersom dendritiska ryggrader får det mesta av den excitatoriska ingången till kortikala och hippocampala pyramidala neuroner, striatala projektionsneuroner etc., är deras morfologier och plasticitet avgörande för att reglera neuronal excitabilitet. , , , , Spin är dock för små ( ~ 1 μm längd) för att mätas direkt experimentellt med avseende på spänningsberoende processer. 10 48 49 50 51 Vi kan modellera en enda ryggrad med två avdelningar: ryggradshuvudet där synapserna är belägna och ryggradshålan som förbinder ryggradshuvudet med dendriter Teorin förutspår att den mycket tunna ryggradshalsen (0,1-0,5 um i diameter) elektroniskt isolerar ryggradshuvudet från sin förälder dendrit, och därmed compartmentaliserar de signaler som genereras vid ryggradshuvudet. . However, the detailed model with fully distributed spines on dendrites (“full-spine model”) is computationally very expensive. A common compromising solution is to modify the capacitance and resistance of the membrane by a spine factor i stället för att modellera alla ryggrader uttryckligen. här, den ryggradsfaktorn syftar till att närma sig ryggradseffekten på cellmembranets biofysiska egenskaper . 52 53 F 54 F 54 Inspired by the previous work of Eyal et al. , we investigated how different spatial patterns of excitatory inputs formed on dendritic spines shape neuronal activities in a human pyramidal neuron model with explicitly modeled spines (Fig. ). Noticeably, Eyal et al. employed the spine factor to incorporate spines into dendrites while only a few activated spines were explicitly attached to dendrites (“few-spine model” in Fig. ). The value of spine in their model was computed from the dendritic area and spine area in the reconstructed data. Accordingly, we calculated the spine density from their reconstructed data to make our full-spine model more consistent with Eyal’s few-spine model. With the spine density set to 1.3 μm-1, the pyramidal neuron model contained about 25,000 spines without altering the model’s original morphological and biophysical properties. Further, we repeated the previous experiment protocols with both full-spine and few-spine models. We use the same synaptic input as in Eyal’s work but attach extra background noise to each sample. By comparing the somatic traces (Fig. ) and spike probability (Fig. ) in full-spine and few-spine models, we found that the full-spine model is much leakier than the few-spine model. In addition, the spike probability triggered by the activation of clustered spines appeared to be more nonlinear in the full-spine model (the solid blue line in Fig. ) than in the few-spine model (the dashed blue line in Fig. ). These results indicate that the conventional F-factor method may underestimate the impact of dense spine on the computations of dendritic excitability and nonlinearity. 51 5a F 5a F 5b, c 5d 5d 5d Experiment setup. We examine two major types of models: few-spine models and full-spine models. Few-spine models (two on the left) are the models that incorporated spine area globally into dendrites and only attach individual spines together with activated synapses. In full-spine models (two on the right), all spines are explicitly attached over whole dendrites. We explore the effects of clustered and randomly distributed synaptic inputs on the few-spine models and the full-spine models, respectively. Somatiska spänningar registrerade för fall i . Colors of the voltage curves correspond to , scale bar: 20 ms, 20 mV. Color-coded voltages during the simulation in at specific times. Colors indicate the magnitude of voltage. Somatic spike probability as a function of the number of simultaneously activated synapses (as in Eyal et al.’s work) for four cases in . Background noise is attached. Run time of experiments in with different simulation methods. NEURON: conventional NEURON simulator running on a single CPU core. CoreNEURON: CoreNEURON simulator on a single GPU. DeepDendrite: DeepDendrite on a single GPU. a b a a c b d a e d I DeepDendrite-plattformen uppnådde både full- och få-spin-modeller 8 gånger snabbare än CoreNEURON på GPU-plattformen och 100 gånger snabbare än seriell NEURON på CPU-plattformen (Fig. ; Supplementary Table ) samtidigt som simuleringsresultaten är identiska (Tilläggsfigurer. and ). Therefore, the DHS method enables explorations of dendritic excitability under more realistic anatomic conditions. 5e 1 4 8 Discussion In this work, we propose the DHS method to parallelize the computation of Hines method and we mathematically demonstrate that the DHS provides an optimal solution without any loss of precision. Next, we implement DHS on the GPU hardware platform and use GPU memory boosting techniques to refine the DHS (Fig. ). When simulating a large number of neurons with complex morphologies, DHS with memory boosting achieves a 15-fold speedup (Supplementary Table ) as compared to the GPU method used in CoreNEURON and up to 1,500-fold speedup compared to serial Hines method in the CPU platform (Fig. ; Supplementary Fig. and Supplementary Table ). Furthermore, we develop the GPU-based DeepDendrite framework by integrating DHS into CoreNEURON. Finally, as a demonstration of the capacity of DeepDendrite, we present a representative application: examine spine computations in a detailed pyramidal neuron model with 25,000 spines. Further in this section, we elaborate on how we have expanded the DeepDendrite framework to enable efficient training of biophysically detailed neural networks. To explore the hypothesis that dendrites improve robustness against adversarial attacks , we train our network on typical image classification tasks. We show that DeepDendrite can support both neuroscience simulations and AI-related detailed neural network tasks with unprecedented speed, therefore significantly promoting detailed neuroscience simulations and potentially for future AI explorations. 55 3 1 4 3 1 56 Decades of efforts have been invested in speeding up the Hines method with parallel methods. Early work mainly focuses on network-level parallelization. In network simulations, each cell independently solves its corresponding linear equations with the Hines method. Network-level parallel methods distribute a network on multiple threads and parallelize the computation of each cell group with each thread , Med nätverksnivåmetoder kan vi simulera detaljerade nätverk på kluster eller superdatorer . In recent years, GPU has been used for detailed network simulation. Because the GPU contains massive computing units, one thread is usually assigned one cell rather than a cell group , , . With further optimization, GPU-based methods achieve much higher efficiency in network simulation. However, the computation inside the cells is still serial in network-level methods, so they still cannot deal with the problem when the “Hines matrix” of each cell scales large. 57 58 59 35 60 61 Cellular-level parallel methods further parallelize the computation inside each cell. The main idea of cellular-level parallel methods is to split each cell into several sub-blocks and parallelize the computation of those sub-blocks , Men typiska cellulära metoder (t.ex. ”multi-split”-metoden ) pay less attention to the parallelization strategy. The lack of a fine parallelization strategy results in unsatisfactory performance. To achieve higher efficiency, some studies try to obtain finer-grained parallelization by introducing extra computation operations , , or making approximations on some crucial compartments, while solving linear equations , . These finer-grained parallelization strategies can get higher efficiency but lack sufficient numerical accuracy as in the original Hines method. 27 28 28 29 38 62 63 64 Unlike previous methods, DHS adopts the finest-grained parallelization strategy, i.e., compartment-level parallelization. By modeling the problem of “how to parallelize” as a combinatorial optimization problem, DHS provides an optimal compartment-level parallelization strategy. Moreover, DHS does not introduce any extra operation or value approximation, so it achieves the lowest computational cost and retains sufficient numerical accuracy as in the original Hines method at the same time. Dendritic spines are the most abundant microstructures in the brain for projection neurons in the cortex, hippocampus, cerebellum, and basal ganglia. As spines receive most of the excitatory inputs in the central nervous system, electrical signals generated by spines are the main driving force for large-scale neuronal activities in the forebrain and cerebellum , . The structure of the spine, with an enlarged spine head and a very thin spine neck—leads to surprisingly high input impedance at the spine head, which could be up to 500 MΩ, combining experimental data and the detailed compartment modeling approach , . Due to such high input impedance, a single synaptic input can evoke a “gigantic” EPSP ( ~ 20 mV) at the spine-head level , , thereby boosting NMDA currents and ion channel currents in the spine . However, in the classic single detailed compartment models, all spines are replaced by the coefficient modifying the dendritic cable geometries . This approach may compensate for the leak currents and capacitance currents for spines. Still, it cannot reproduce the high input impedance at the spine head, which may weaken excitatory synaptic inputs, particularly NMDA currents, thereby reducing the nonlinearity in the neuron’s input-output curve. Our modeling results are in line with this interpretation. 10 11 48 65 48 66 11 F 54 On the other hand, the spine’s electrical compartmentalization is always accompanied by the biochemical compartmentalization , , , resulting in a drastic increase of internal [Ca2+], within the spine and a cascade of molecular processes involving synaptic plasticity of importance for learning and memory. Intriguingly, the biochemical process triggered by learning, in turn, remodels the spine’s morphology, enlarging (or shrinking) the spine head, or elongating (or shortening) the spine neck, which significantly alters the spine’s electrical capacity , , , . Such experience-dependent changes in spine morphology also referred to as “structural plasticity”, have been widely observed in the visual cortex , , somatosensory cortex , , motor cortex , hippocampus , and the basal ganglia in vivo. They play a critical role in motor and spatial learning as well as memory formation. However, due to the computational costs, nearly all detailed network models exploit the “F-factor” approach to replace actual spines, and are thus unable to explore the spine functions at the system level. By taking advantage of our framework and the GPU platform, we can run a few thousand detailed neurons models, each with tens of thousands of spines on a single GPU, while maintaining ~100 times faster than the traditional serial method on a single CPU (Fig. ). Therefore, it enables us to explore of structural plasticity in large-scale circuit models across diverse brain regions. 8 52 67 67 68 69 70 71 72 73 74 75 9 76 5e Another critical issue is how to link dendrites to brain functions at the systems/network level. It has been well established that dendrites can perform comprehensive computations on synaptic inputs due to enriched ion channels and local biophysical membrane properties , , . For example, cortical pyramidal neurons can carry out sublinear synaptic integration at the proximal dendrite but progressively shift to supralinear integration at the distal dendrite . Moreover, distal dendrites can produce regenerative events such as dendritic sodium spikes, calcium spikes, and NMDA spikes/plateau potentials , . Such dendritic events are widely observed in mice or even human cortical neurons in vitro, which may offer various logical operations , or gating functions , . Recently, in vivo recordings in awake or behaving mice provide strong evidence that dendritic spikes/plateau potentials are crucial for orientation selectivity in the visual cortex , sensory-motor integration in the whisker system , , and spatial navigation in the hippocampal CA1 region . 5 6 7 77 6 78 6 79 6 79 80 81 82 83 84 85 To establish the causal link between dendrites and animal (including human) patterns of behavior, large-scale biophysically detailed neural circuit models are a powerful computational tool to realize this mission. However, running a large-scale detailed circuit model of 10,000-100,000 neurons generally requires the computing power of supercomputers. It is even more challenging to optimize such models for in vivo data, as it needs iterative simulations of the models. The DeepDendrite framework can directly support many state-of-the-art large-scale circuit models , , , which were initially developed based on NEURON. Moreover, using our framework, a single GPU card such as Tesla A100 could easily support the operation of detailed circuit models of up to 10,000 neurons, thereby providing carbon-efficient and affordable plans for ordinary labs to develop and optimize their own large-scale detailed models. 86 87 88 Recent works on unraveling the dendritic roles in task-specific learning have achieved remarkable results in two directions, i.e., solving challenging tasks such as image classification dataset ImageNet with simplified dendritic networks , and exploring full learning potentials on more realistic neuron , Det finns emellertid en kompromiss mellan modellstorlek och biologisk detalj, eftersom ökningen av nätverksskalor ofta offras för komplexitet på neuronnivå. , , . Moreover, more detailed neuron models are less mathematically tractable and computationally expensive . 20 21 22 19 20 89 21 There has also been progress in the role of active dendrites in ANNs for computer vision tasks. Iyer et al. . proposed a novel ANN architecture with active dendrites, demonstrating competitive results in multi-task and continual learning. Jones and Kording used a binary tree to approximate dendrite branching and provided valuable insights into the influence of tree structure on single neurons’ computational capacity. Bird et al. . proposed a dendritic normalization rule based on biophysical behavior, offering an interesting perspective on the contribution of dendritic arbor structure to computation. While these studies offer valuable insights, they primarily rely on abstractions derived from spatially extended neurons, and do not fully exploit the detailed biological properties and spatial information of dendrites. Further investigation is needed to unveil the potential of leveraging more realistic neuron models for understanding the shared mechanisms underlying brain computation and deep learning. 90 91 92 Som svar på dessa utmaningar utvecklade vi DeepDendrite, ett verktyg som använder metoden Dendritic Hierarchical Scheduling (DHS) för att avsevärt minska beräkningskostnaderna och innehåller en I/O-modul och en inlärningsmodul för att hantera stora datamängder. ). This network demonstrated efficient training capabilities in image classification tasks, achieving approximately 25 times speedup compared to training on a traditional CPU-based platform (Fig. ; Supplementary Table ). 6a, b 6f 1 The illustration of the Human Pyramidal Cell Network (HPC-Net) for image classification. Images are transformed to spike trains and fed into the network model. Learning is triggered by error signals propagated from soma to dendrites. Training with mini-batch. Multiple networks are simulated simultaneously with different images as inputs. The total weight updates ΔW are computed as the average of ΔWi from each network. Comparison of the HPC-Net before and after training. Left, the visualization of hidden neuron responses to a specific input before (top) and after (bottom) training. Right, hidden layer weights (from input to hidden layer) distribution before (top) and after (bottom) training. Workflow of the transfer adversarial attack experiment. We first generate adversarial samples of the test set on a 20-layer ResNet. Then use these adversarial samples (noisy images) to test the classification accuracy of models trained with clean images. Prediction accuracy of each model on adversarial samples after training 30 epochs on MNIST (left) and Fashion-MNIST (right) datasets. Run time of training and testing for the HPC-Net. The batch size is set to 16. Left, run time of training one epoch. Right, run time of testing. Parallel NEURON + Python: training and testing on a single CPU with multiple cores, using 40-process-parallel NEURON to simulate the HPC-Net and extra Python code to support mini-batch training. DeepDendrite: training and testing the HPC-Net on a single GPU with DeepDendrite. a b c d e f Additionally, it is widely recognized that the performance of Artificial Neural Networks (ANNs) can be undermined by adversarial attacks —intentionally engineered perturbations devised to mislead ANNs. Intriguingly, an existing hypothesis suggests that dendrites and synapses may innately defend against such attacks . Our experimental results utilizing HPC-Net lend support to this hypothesis, as we observed that networks endowed with detailed dendritic structures demonstrated some increased resilience to transfer adversarial attacks jämfört med standard ANNs, vilket framgår av MNIST and Fashion-MNIST och databaser (Fig. ). This evidence implies that the inherent biophysical properties of dendrites could be pivotal in augmenting the robustness of ANNs against adversarial interference. Nonetheless, it is essential to conduct further studies to validate these findings using more challenging datasets such as ImageNet . 93 56 94 95 96 6d, e 97 In conclusion, DeepDendrite has shown remarkable potential in image classification tasks, opening up a world of exciting future directions and possibilities. To further advance DeepDendrite and the application of biologically detailed dendritic models in AI tasks, we may focus on developing multi-GPU systems and exploring applications in other domains, such as Natural Language Processing (NLP), where dendritic filtering properties align well with the inherently noisy and ambiguous nature of human language. Challenges include testing scalability in larger-scale problems, understanding performance across various tasks and domains, and addressing the computational complexity introduced by novel biological principles, such as active dendrites. By overcoming these limitations, we can further advance the understanding and capabilities of biophysically detailed dendritic neural networks, potentially uncovering new advantages, enhancing their robustness against adversarial attacks and noisy inputs, and ultimately bridging the gap between neuroscience and modern AI. Methods Simulation with DHS Koreanerna simulator ( Användning av neuron architecture and is optimized for both memory usage and computational speed. We implement our Dendritic Hierarchical Scheduling (DHS) method in the CoreNEURON environment by modifying its source code. All models that can be simulated on GPU with CoreNEURON can also be simulated with DHS by executing the following command: 35 https://github.com/BlueBrain/CoreNeuron 25 coreneuron_exec -d /path/to/models -e time --cell-permute 3 --cell-nthread 16 --gpu The usage options are as in Table . 1 Accuracy of the simulation using cellular-level parallel computation To ensure the accuracy of the simulation, we first need to define the correctness of a cellular-level parallel algorithm to judge whether it will generate identical solutions compared with the proven correct serial methods, like the Hines method used in the NEURON simulation platform. Based on the theories in parallel computing , a parallel algorithm will yield an identical result as its corresponding serial algorithm, if and only if the data process order in the parallel algorithm is consistent with data dependency in the serial method. The Hines method has two symmetrical phases: triangularization and back-substitution. By analyzing the serial computing Hines method , we find that its data dependency can be formulated as a tree structure, where the nodes on the tree represent the compartments of the detailed neuron model. In the triangularization process, the value of each node depends on its children nodes. In contrast, during the back-substitution process, the value of each node is dependent on its parent node (Fig. ). Thus, we can compute nodes on different branches in parallel as their values are not dependent. 34 55 1d Baserat på databeroendet för Hines-metoden förser vi oss med tre villkor för att säkerställa att en parallellmetod ger identiska lösningar som Hines-metoden för seriell beräkning: (1) Trädets morfologi och initiala värden för alla noder är identiska med dem i Hines-metoden för seriell beräkning; (2) I triangulariseringsfasen kan en nod bearbetas om och endast om alla dess barnnoder redan bearbetas; (3) I back-substitutionsfasen kan en nod bearbetas endast om dess modernod redan bearbetas. Beräkningskostnad för cellulär nivå parallell beräkningsmetod To theoretically evaluate the run time, i.e., efficiency, of the serial and parallel computing methods, we introduce and formulate the concept of computational cost as follows: given a tree and trådar (grundläggande beräkningsenheter) för att utföra triangularisering, parallell triangularisering är lika med att dela noden of into subsets, i.e., = { , , … } where the size of each subset | | ≤ , i.e., at most nodes can be processed each step since there are only threads. The process of the triangularization phase follows the order: → → … → , and nodes in the same subset can be processed in parallel. So, we define | | (the size of set , i.e., here) as the computational cost of the parallel computing method. In short, we define the computational cost of a parallel method as the number of steps it takes in the triangularization phase. Because the back-substitution is symmetrical with triangularization, the total cost of the entire solving equation phase is twice that of the triangularization phase. T k V T n V V1 V2 Vn Vi k k k V1 V2 Vn Vi V V n Mathematical scheduling problem Baserat på simuleringens noggrannhet och beräkningskostnad formulerar vi parallelleringsproblemet som ett matematiskt schemaläggningsproblem: Given a tree = { , } and a positive integer och var is the node-set and is the edge set. Define partition ( ) = { , och ... }, | | ≤ , 1 ≤ ≤ n, where | KD anger kardinalnumret för underuppsättningen , det vill säga antalet noder i , and for each node ∈ , all its children nodes { | ∈children( )} must in a previous subset där 1 ≤ » . Our goal is to find an optimal partition ( ) whose computational cost | ( )| is minimal. T V E k V E P V V1 V2 Vn Vi k i Vi Vi Vi v Vi c c v Vj j i P* V P * V Here subset consists of all nodes that will be computed at -th step (Fig. ), so | Länkar ≤ indicates that we can compute nodes each step at most because the number of available threads is . The restriction “for each node ∈ och alla hennes barn knölar | ∈children( )} must in a previous subset , where 1 ≤ < ” indicates that node can be processed only if all its child nodes are processed. Vi i 2e Vi k k k v Vi c c v Vj j i v DHS implementation We aim to find an optimal way to parallelize the computation of solving linear equations for each neuron model by solving the mathematical scheduling problem above. To get the optimal partition, DHS first analyzes the topology and calculates the depth ( ) for all nodes ∈ Sedan kommer följande två steg att utföras iterativt tills varje nod ∈ is assigned to a subset: (1) find all candidate nodes and put these nodes into candidate set . A node is a candidate only if all its child nodes have been processed or it does not have any child nodes. (2) if | | ≤ , i.e., the number of candidate nodes is smaller or equivalent to the number of available threads, remove all nodes in and put them into , otherwise, remove deepest nodes from and add them to subset . Label these nodes as processed nodes (Fig. ). After filling in subset , go to step (1) to fill in the next subset . d v v V v V Q Q k Q V*i k Q Vi 2d Vi Vi+1 Correctness proof for DHS After applying DHS to a neural tree = { , }, we get a partition ( ) = { , , … }, | | ≤ , 1 ≤ ≤ . Nodes in the same subset will be computed in parallel, taking steps to perform triangularization and back-substitution, respectively. We then demonstrate that the reordering of the computation in DHS will result in a result identical to the serial Hines method. T V E P V V1 V2 Vn Vi k i n Vi n The partition ( ) obtained from DHS decides the computation order of all nodes in a neural tree. Below we demonstrate that the computation order determined by ( ) satisfies the correctness conditions. ( ) erhålls från det givna neurala trädet Operationer i DHS ändrar inte trädtopologin och värdena för trädnoder (motsvarande värden i de linjära ekvationerna), så trädmorfologin och startvärdena för alla noder ändras inte, vilket uppfyller villkoret 1: trädmorfologin och startvärdena för alla noder är identiska med dem i seriell Hines-metoden. to . As shown in the implementation of DHS, all nodes in subset är utvalda från kandidatset , and a node can be put into only if all its child nodes have been processed. Thus the child nodes of all nodes in are in { , , … }, meaning that a node is only computed after all its children have been processed, which satisfies condition 2: in triangularization, a node can be processed if and only if all its child nodes are already processed. In back-substitution, the computation order is the opposite of that in triangularization, i.e., from to . As shown before, the child nodes of all nodes in De är i { , , … }, so parent nodes of nodes in are in { , , … }, which satisfies condition 3: in back-substitution, a node can be processed only if its parent node is already processed. P V P V P V T V1 Vn Vi Q Q Vi V1 V2 Vi-1 Vn V1 Vi V1 V2 Vi-1 Vi Vi+1 Vi+2 Vn Optimalt bevis för DHS The idea of the proof is that if there is another optimal solution, it can be transformed into our DHS solution without increasing the number of steps the algorithm requires, thus indicating that the DHS solution is optimal. For each subset in ( ), DHS moves (trådnummer) djupaste noder från motsvarande kandidatuppsättning to . If the number of nodes in is smaller than , move all nodes from Två . To simplify, we introduce , indicating the depth sum of deepest nodes in Alla undergrupper i ( ) satisfy the max-depth criteria (Supplementary Fig. Vi visar sedan att valet av de djupaste noderna i varje iteration gör an optimal partition. If there exists an optimal partition = { , , … } containing subsets that do not satisfy the max-depth criteria, we can modify the subsets in ( ) so that all subsets consist of the deepest nodes from and the number of subsets ( | ( )|) remain the same after modification. Vi P V k Qi Vi Qi k Qi Vi Di k Qi P V 6a P(V) P * (V) V * 1 V*2 V*s P* V Q P * V Without any loss of generalization, we start from the first subset not satisfying the criteria, i.e., . There are two possible cases that will make not satisfy the max-depth criteria: (1) | | < och det finns några giltiga noder i that are not put to • 2 § | = but nodes in are not the deepest nodes in . V*i V*i V*i k Qi V*i V*i k V*i k Qi For case (1), because some candidate nodes are not put to , these nodes must be in the subsequent subsets. As | | , we can move the corresponding nodes from the subsequent subsets to , which will not increase the number of subsets and make satisfy the criteria (Supplementary Fig. , top). For case (2), | Gävle = , these deeper nodes that are not moved from the candidate set into must be added to subsequent subsets (Supplementary Fig. , bottom). These deeper nodes can be moved from subsequent subsets to through the following method. Assume that after filling , is picked and one of the -th deepest nodes is still in , thus will be put into a subsequent subset ( > ). We first move from to + , then modify subset + as follows: if | + | ≤ and none of the nodes in + is the parent of node , stop modifying the latter subsets. Otherwise, modify + är as follows (Supplementary Fig. ): if the parent node of is in + , move this parent node to + ; else move the node with minimum depth from + to + . After adjusting , modify subsequent subsets + , + , … with the same strategy. Finally, move from to . V*i V*i < k V * I V*i 6b V*i k Qi V*i 6b V*i V*i v k v’ Qi v’ V*j j i v V*i V*i 1 V*i 1 V*i 1 k V*i 1 v V*i 1 6c v V*i 1 V*i 2 V*i 1 V*i 2 V*i V*i 1 V*i 2 V*j-1 v’ V*j V*i With the modification strategy described above, we can replace all shallower nodes in with the -th deepest node in and keep the number of subsets, i.e., | ( )| the same after modification. We can modify the nodes with the same strategy for all subsets in ( ) that do not contain the deepest nodes. Finally, all subsets ∈ ( ) can satisfy the max-depth criteria, and | ( )| does not change after modifying. V*i k Qi P* V P* V V*i P* V P* V In conclusion, DHS generates a partition ( ), and all subsets ∈ ( ) satisfy the max-depth condition: . For any other optimal partition ( ) we can modify its subsets to make its structure the same as ( ), i.e., each subset consists of the deepest nodes in the candidate set, and keep | ( ) the same after modification. So, the partition ( ) obtained from DHS is one of the optimal partitions. P V Vi P V P* V P V P* V | P V GPU implementation and memory boosting To achieve high memory throughput, GPU utilizes the memory hierarchy of (1) global memory, (2) cache, (3) register, where global memory has large capacity but low throughput, while registers have low capacity but high throughput. We aim to boost memory throughput by leveraging the memory hierarchy of GPU. GPU employs SIMT (Single-Instruction, Multiple-Thread) architecture. Warps are the basic scheduling units on GPU (a warp is a group of 32 parallel threads). A warp executes the same instruction with different data for different threads . Correctly ordering the nodes is essential for this batching of computation in warps, to make sure DHS obtains identical results as the serial Hines method. When implementing DHS on GPU, we first group all cells into multiple warps based on their morphologies. Cells with similar morphologies are grouped in the same warp. We then apply DHS on all neurons, assigning the compartments of each neuron to multiple threads. Because neurons are grouped into warps, the threads for the same neuron are in the same warp. Therefore, the intrinsic synchronization in warps keeps the computation order consistent with the data dependency of the serial Hines method. Finally, threads in each warp are aligned and rearranged according to the number of compartments. 46 When a warp loads pre-aligned and successively-stored data from global memory, it can make full use of the cache, which leads to high memory throughput, while accessing scatter-stored data would reduce memory throughput. After compartments assignment and threads rearrangement, we permute data in global memory to make it consistent with computing orders so that warps can load successively-stored data when executing the program. Moreover, we put those necessary temporary variables into registers rather than global memory. Registers have the highest memory throughput, so the use of registers further accelerates DHS. Full-spin och få-spin biofysiska modeller We used the published human pyramidal neuron . The membrane capacitance m = 0.44 μF cm-2, membrane resistance m = 48,300 Ω cm2, and axial resistivity a = 261.97 Ω cm. In this model, all dendrites were modeled as passive cables while somas were active. The leak reversal potential l = -83.1 mV. Ion channels such as Na+ and K+ were inserted on soma and initial axon, and their reversal potentials were Na = 67.6 mV, K = -102 mV respectively. All these specific parameters were set the same as in the model of Eyal, et al. , for more details please refer to the published model (ModelDB, access No. 238347). 51 c r r E E E 51 In the few-spine model, the membrane capacitance and maximum leak conductance of the dendritic cables 60 μm away from soma were multiplied by a ryggfaktorn för att närma sig dendritiska ryggkotor. I denna modell, spine was set to 1.9. Only the spines that receive synaptic inputs were explicitly attached to dendrites. F F In the full-spine model, all spines were explicitly attached to dendrites. We calculated the spine density with the reconstructed neuron in Eyal, et al. . The spine density was set to 1.3 μm-1, and each cell contained 24994 spines on dendrites 60 μm away from the soma. 51 The morphologies and biophysical mechanisms of spines were the same in few-spine and full-spine models. The length of the spine neck neck = 1.35 μm and the diameter nacke = 0,25 μm, medan ryggradshuvudets längd och diameter var 0,944 μm, dvs. ryggradshuvudets område ställdes in på 2,8 μm2. = -86 mV. The specific membrane capacitance, membrane resistance, and axial resistivity were the same as those for dendrites. L D El Synaptic inputs We investigated neuronal excitability for both distributed and clustered synaptic inputs. All activated synapses were attached to the terminal of the spine head. For distributed inputs, all activated synapses were randomly distributed on all dendrites. For clustered inputs, each cluster consisted of 20 activated synapses that were uniformly distributed on a single randomly-selected compartment. All synapses were activated simultaneously during the simulation. AMPA-based and NMDA-based synaptic currents were simulated as in Eyal et al.’s work. AMPA conductance was modeled as a double-exponential function and NMDA conduction as a voltage-dependent double-exponential function. For the AMPA model, the specific rise and decay were set to 0.3 and 1.8 ms. For the NMDA model, rise and decay were set to 8.019 and 34.9884 ms, respectively. The maximum conductance of AMPA and NMDA were 0.73 nS and 1.31 nS. τ τ τ τ Background noise We attached background noise to each cell to simulate a more realistic environment. Noise patterns were implemented as Poisson spike trains with a constant rate of 1.0 Hz. Each pattern started at start = 10 ms and lasted until the end of the simulation. We generated 400 noise spike trains for each cell and attached them to randomly-selected synapses. The model and specific parameters of synaptic currents were the same as described in , except that the maximum conductance of NMDA was uniformly distributed from 1.57 to 3.275, resulting in a higher AMPA to NMDA ratio. t Synaptic Inputs Exploring neuronal excitability We investigated the spike probability when multiple synapses were activated simultaneously. For distributed inputs, we tested 14 cases, from 0 to 240 activated synapses. For clustered inputs, we tested 9 cases in total, activating from 0 to 12 clusters respectively. Each cluster consisted of 20 synapses. For each case in both distributed and clustered inputs, we calculated the spike probability with 50 random samples. Spike probability was defined as the ratio of the number of neurons fired to the total number of samples. All 1150 samples were simulated simultaneously on our DeepDendrite platform, reducing the simulation time from days to minutes. Performing AI tasks with the DeepDendrite platform Conventional detailed neuron simulators lack two functionalities important to modern AI tasks: (1) alternately performing simulations and weight updates without heavy reinitialization and (2) simultaneously processing multiple stimuli samples in a batch-like manner. Here we present the DeepDendrite platform, which supports both biophysical simulating and performing deep learning tasks with detailed dendritic models. DeepDendrite consists of three modules (Supplementary Fig. ): (1) an I/O module; (2) a DHS-based simulating module; (3) a learning module. When training a biophysically detailed model to perform learning tasks, users first define the learning rule, then feed all training samples to the detailed model for learning. In each step during training, the I/O module picks a specific stimulus and its corresponding teacher signal (if necessary) from all training samples and attaches the stimulus to the network model. Then, the DHS-based simulating module initializes the model and starts the simulation. After simulation, the learning module updates all synaptic weights according to the difference between model responses and teacher signals. After training, the learned model can achieve performance comparable to ANN. The testing phase is similar to training, except that all synaptic weights are fixed. 5 HPC-Net model Image classification is a typical task in the field of AI. In this task, a model should learn to recognize the content in a given image and output the corresponding label. Here we present the HPC-Net, a network consisting of detailed human pyramidal neuron models that can learn to perform image classification tasks by utilizing the DeepDendrite platform. HPC-Net has three layers, i.e., an input layer, a hidden layer, and an output layer. The neurons in the input layer receive spike trains converted from images as their input. Hidden layer neurons receive the output of input layer neurons and deliver responses to neurons in the output layer. The responses of the output layer neurons are taken as the final output of HPC-Net. Neurons between adjacent layers are fully connected. For each image stimulus, we first convert each normalized pixel to a homogeneous spike train. For pixel with coordinates ( ) in the image, the corresponding spike train has a constant interspike interval ISI( ) (in ms) which is determined by the pixel value ( ) as shown in Eq. ( och ) x, y τ x, y p x och y 1 In our experiment, the simulation for each stimulus lasted 50 ms. All spike trains started at 9 + ISI ms and lasted until the end of the simulation. Then we attached all spike trains to the input layer neurons in a one-to-one manner. The synaptic current triggered by the spike arriving at time is given by τ t0 where is the post-synaptic voltage, the reversal potential syn = 1 mV, the maximum synaptic conductance max = 0.05 μS, and the time constant = 0.5 ms. v E g τ Neurons in the input layer were modeled with a passive single-compartment model. The specific parameters were set as follows: membrane capacitance m = 1.0 μF cm-2, membrane resistance m = 104 Ω cm2, axial resistivity a = 100 Ω cm, reversal potential of passive compartment l = 0 mV. c r r E The hidden layer contains a group of human pyramidal neuron models, receiving the somatic voltages of input layer neurons. The morphology was from Eyal, et al. , and all neurons were modeled with passive cables. The specific membrane capacitance m = 1.5 μF cm-2, membrane resistance m = 48,300 Ω cm2, axial resistivity a = 261.97 Ω cm, and the reversal potential of all passive cables l = 0 mV. Inmatningsneuroner kan göra flera anslutningar till slumpmässigt utvalda platser på dendriterna av dolda neuroner. Synapsen av den -th input neuron on neuron ’s dendrite is defined as in Eq. ( ), where is the synaptic conductance, is the synaptic weight, is the ReLU-like somatic activation function, and is the somatic voltage of the Inmatning av neuron i tid . 51 c r r E k i j 4 gijk Wijk i t Neurons in the output layer were also modeled with a passive single-compartment model, and each hidden neuron only made one synaptic connection to each output neuron. All specific parameters were set the same as those of the input neurons. Synaptic currents activated by hidden neurons are also in the form of Eq. ( ). 4 Image classification with HPC-Net For each input image stimulus, we first normalized all pixel values to 0.0-1.0. Then we converted normalized pixels to spike trains and attached them to input neurons. Somatic voltages of the output neurons are used to compute the predicted probability of each class, as shown in equation , where is the probability of -th class predicted by the HPC-Net, is the average somatic voltage from 20 ms to 50 ms of the -th output neuron, and indicates the number of classes, which equals the number of output neurons. The class with the maximum predicted probability is the final classification result. In this paper, we built the HPC-Net with 784 input neurons, 64 hidden neurons, and 10 output neurons. 6 pi i i C Synaptic plasticity regler för HPC-Net Inspirerad av tidigare arbete , we use a gradient-based learning rule to train our HPC-Net to perform the image classification task. The loss function we use here is cross-entropy, given in Eq. ( ), where is the predicted probability for class , indicates the actual class the stimulus image belongs to, = 1 if input image belongs to class , and = 0 if not. 36 7 pi i yi Yin i yi When training HPC-Net, we compute the update for weight (the synaptic weight of the -th synapse connecting neuron to neuron ) at each time step. After the simulation of each image stimulus, is updated as shown in Eq. ( ): Wijk k i j Wijk 8 Here is the learning rate, is the update value at time , , are somatic voltages of neuron och respectively, is the -th synaptic current activated by neuron on neuron , its synaptic conductance, is the transfer resistance between the -th connected compartment of neuron on neuron ’s dendrite to neuron ’s soma, s = 30 ms, e = 50 ms are start time and end time for learning respectively. For output neurons, the error term can be computed as shown in Eq. ( ). For hidden neurons, the error term is calculated from the error terms in the output layer, given in Eq. ( ). t vj vi i j Iijk k i j gijk rijk k i j j t t 10 11 Since all output neurons are single-compartment, equals to the input resistance of the corresponding compartment, . Transfer and input resistances are computed by NEURON. Mini-batch training is a typical method in deep learning for achieving higher prediction accuracy and accelerating convergence. DeepDendrite also supports mini-batch training. When training HPC-Net with mini-batch size batch, we make batchkopior av HPC-Net. Under träningen matas varje kopia med ett annat träningsprov från batchen. DeepDendrite beräknar först viktuppdateringen för varje kopia separat. Efter att alla kopior i det aktuella träningspartiet har gjorts beräknas den genomsnittliga viktuppdateringen och vikterna i alla kopior uppdateras med samma belopp. N N Robustness against adversarial attack with HPC-Net To demonstrate the robustness of HPC-Net, we tested its prediction accuracy on adversarial samples and compared it with an analogous ANN (one with the same 784-64-10 structure and ReLU activation, for fair comparison in our HPC-Net each input neuron only made one synaptic connection to each hidden neuron). We first trained HPC-Net and ANN with the original training set (original clean images). Then we added adversarial noise to the test set and measured their prediction accuracy on the noisy test set. We used the Foolbox , to generate adversarial noise with the FGSM method . ANN was trained with PyTorch , and HPC-Net was trained with our DeepDendrite. For fairness, we generated adversarial noise on a significantly different network model, a 20-layer ResNet . The noise level ranged from 0.02 to 0.2. We experimented on two typical datasets, MNIST och Fashion-MNIST Resultaten visar att förutsägelse noggrannheten av HPC-Net är 19% och 16.72% högre än den av den analoga ANN, respektive. 98 99 93 100 101 95 96 Rapportering sammanfattning Further information on research design is available in the linked to this article. Nature Portfolio Reporting Summary Data availability The data that support the findings of this study are available within the paper, Supplementary Information and Source Data files provided with this paper. The source code and data that used to reproduce the results in Figs. – are available at . The MNIST dataset is publicly available at Fashion-MNIST-dataset är offentligt tillgängligt på . are provided with this paper. 3 6 https://github.com/pkuzyc/DeepDendrite http://yann.lecun.com/exdb/mnist https://github.com/zalandoresearch/fashion-mnist Source data Code availability The source code of DeepDendrite as well as the models and code used to reproduce Figs. – in this study are available at . 3 6 https://github.com/pkuzyc/DeepDendrite References McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. , 115–133 (1943). Bull. Math. Biophys. 5 LeCun, Y., Bengio, Y. & Hinton, G. Djup inlärning. Natur 521, 436–444 (2015). Poirazi, P., Brannon, T. & Mel, B. W. Aritmetik av subtröskel synaptisk summation i en modell CA1 pyramid cell. Neuron 37, 977–987 (2003). London, M. & Häusser, M. Dendritic computation. , 503–532 (2005). Annu. Rev. Neurosci. 28 Branco, T. & Häusser, M. The single dendritic branch as a fundamental functional unit in the nervous system. , 494–502 (2010). Curr. Opin. Neurobiol. 20 Stuart, G. J. & Spruston, N. Dendritisk integration: 60 år av framsteg. Nat. Neurosci. 18, 1713–1721 (2015). Poirazi, P. & Papoutsi, A. Illuminating dendritisk funktion med beräkningsmodeller. Nat. Rev. Neurosci. 21, 303–321 (2020). Yuste, R. & Denk, W. Dendritiska ryggrader som grundläggande funktionella enheter för neuronal integration. Engert, F. & Bonhoeffer, T. Dendritiska förändringar i ryggraden i samband med hippocampal långsiktig synaptisk plasticitet. natur 399, 66–70 (1999). Yuste, R. Dendritiska ryggrader och distribuerade kretsar. Neuron 71, 772–781 (2011). Yuste, R. Electrical compartmentalization in dendritic spines. , 429–449 (2013). Annu. Rev. Neurosci. 36 Rall, W. Branching dendritic trees and motoneuron membrane resistivity. , 491–527 (1959). Exp. Neurol. 1 Segev, I. & Rall, W. Computational study of an excitable dendritic spine. , 499–523 (1988). J. Neurophysiol. 60 Silver, D. et al. Mastering the game of go with deep neural networks and tree search. , 484–489 (2016). Nature 529 Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. , 1140–1144 (2018). Science 362 McCloskey, M. & Cohen, N. J. Katastrofala störningar i konnektivistiska nätverk: det sekventiella inlärningsproblemet. French, R. M. Catastrophic forgetting in connectionist networks. , 128–135 (1999). Trends Cogn. Sci. 3 Naud, R. & Sprekeler, H. Sparse sprängningar optimera informationsöverföring i en multiplexerad nervkod. Proc. Natl Acad. Sci. USA 115, E6329–E6338 (2018). Sacramento, J., Costa, R. P., Bengio, Y. & Senn, W. Dendritiska kortikala mikrocirkulationer närmar sig backpropagationsalgoritmen. i Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (NeurIPS*,* 2018). Payeur, A., Guerguiev, J., Zenke, F., Richards, B. A. & Naud, R. Burst-beroende synaptisk plasticitet kan samordna inlärning i hierarkiska kretsar. Bicknell, B. A. & Häusser, M. A synaptic learning rule for exploiting nonlinear dendritic computation. , 4001–4017 (2021). Neuron 109 Moldwin, T., Kalmenson, M. & Segev, I. Gradientklusteron: en modellneuron som lär sig att lösa klassificeringsuppgifter genom dendritiska icke-lineariteter, strukturell plasticitet och gradient nedstigning. Hodgkin, A. L. & Huxley, A. F. En kvantitativ beskrivning av membranströmmen och dess tillämpning på ledning och excitation i nerven. Rall, W. Theory of physiological properties of dendrites. , 1071–1092 (1962). Ann. N. Y. Acad. Sci. 96 Hines, M. L. och Carnevale, N. T. Neural Comput. 9, 1179–1209 (1997). Bower, J. M. & Beeman, D. i The Book of GENESIS: Exploring Realistic Neural Models with the General Neural Simulation System (eds Bower, J. M. & Beeman, D.) 17–27 (Springer New York, 1998). Hines, M. L., Eichner, H. & Schürmann, F. Neuron splitting in compute-bound parallel network simulations enables runtime scaling with twice as many processors. , 203–210 (2008). J. Comput. Neurosci. 25 Hines, M. L., Markram, H. & Schürmann, F. Fully implicit parallel simulation of single neurons. , 439–448 (2008). J. Comput. Neurosci. 25 Ben-Shalom, R., Liberman, G. & Korngreen, A. Acceleration av avdelningsmodellering på en grafisk bearbetningsenhet. Tsuyuki, T., Yamamoto, Y. & Yamazaki, T. Effektiv numerisk simulering av neuronmodeller med rumslig struktur på grafiska bearbetningsenheter. i Proc. 2016 International Conference on Neural Information Processing (eds Hirose894Akiraet al.) 279–285 (Springer International Publishing, 2016). Vooturi, D. T., Kothapalli, K. & Bhalla, USA Parallelizing Hines Matrix Solver in Neuron Simulations on GPU. I Proc. IEEE 24th International Conference on High Performance Computing (HiPC) 388–397 (IEEE, 2017). Huber, F. Efficient tree solver for hines matrices on the GPU. Preprint at (2018). https://arxiv.org/abs/1810.12742 Korte, B. & Vygen, J. 6 edn (Springer, 2018). Combinatorial Optimization Theory and Algorithms Gebali, F. (Wiley, 2011). Algorithms and Parallel Computing Kumbhar, P. et al. CoreNEURON: An optimized compute engine for the NEURON simulator. , 63 (2019). Front. Neuroinform. 13 Urbanczik, R. & Senn, W. Learning by the dendritic prediction of somatic spiking. , 521–528 (2014). Neuron 81 Ben-Shalom, R., Aviv, A., Razon, B. & Korngreen, A. Optimering av jonkanalmodeller med hjälp av en parallell genetisk algoritm på grafiska processorer. Mascagni, M. En parallellerande algoritm för beräkningslösningar till godtyckligt förgreningade kabelneuronmodeller.J. Neurosci. McDougal, R. A. et al. Twenty years of modelDB and beyond: building essential modeling tools for the future of neuroscience. , 1–10 (2017). J. Comput. Neurosci. 42 Migliore, M., Messineo, L. & Ferrante, M. Dendritic Ih selectively blocks temporal summation of unsynchronized distal inputs in CA1 pyramidal neurons. , 5–13 (2004). J. Comput. Neurosci. 16 Hemond, P. et al. Olika klasser av pyramidceller visar ömsesidigt exklusiva skjutmönster i hippocampalområdet CA3b. Hay, E., Hill, S., Schürmann, F., Markram, H. & Segev, I. Models of neocortical layer 5b pyramidal cells capturing a wide range of dendritic and perisomatic active Properties. , e1002107 (2011). PLoS Comput. Biol. 7 Masoli, S., Solinas, S. & D’Angelo, E. Åtgärdspotential bearbetning i en detaljerad purkinje cellmodell avslöjar en kritisk roll för axonal compartmentalization. Lindroos, R. et al. Basal ganglia neuromodulation over multiple temporal and structural scales—simulations of direct pathway MSNs investigate the fast onset of dopaminergic effects and predict the role of Kv4.2. , 3 (2018). Front. Neural Circuits 12 Migliore, M. et al. Synaptiska kluster fungerar som luktoperatorer i luktbulb. Proc. Natl Acad. Sci. USa 112, 8499–8504 (2015). NVIDIA. CUDA C++ Programmering Guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (2021). NVIDIA. . (2021). CUDA C++ Best Practices Guide https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html Harnett, M. T., Makara, J. K., Spruston, N., Kath, W. L. & Magee, J. C. Synaptic amplification by dendritic spines enhances input cooperativity. , 599–602 (2012). Nature 491 Chiu, C. Q. et al. Compartmentalization of GABAergic inhibition by dendritic spines. , 759–762 (2013). Science 340 Tønnesen, J., Katona, G., Rózsa, B. & Nägerl, U. V. Spinal halsplasticitet reglerar synapspartimentalisering. Nat. Neurosci. 17, 678–685 (2014). Eyal, G. et al. Mänskliga kortikala pyramidala neuroner: från ryggraden till spikar via modeller. Front. Cell. Neurosci. 12, 181 (2018). Koch, C. & Zador, A. The function of dendritic spines: devices subserving biochemical rather than electrical compartmentalization. , 413–422 (1993). J. Neurosci. 13 Koch, C. Dendritic spines. In (Oxford University Press, 1999). Biophysics of Computation Rapp, M., Yarom, Y. & Segev, I. Påverkan av parallell fiberbakgrundsaktivitet på kabeln egenskaperna hos cerebellära purkinje celler. Hines, M. Efficient computation of branched nerve equations. , 69–76 (1984). Int. J. Bio-Med. Comput. 15 Nayebi, A. & Ganguli, S. Biologiskt inspirerat skydd av djupa nätverk från motståndarattacker. Förtryck på https://arxiv.org/abs/1703.09202 (2017). Goddard, N. H. & Hood, G. Large-Scale Simulation Using Parallel GENESIS. In (eds Bower James M. & Beeman David) 349-379 (Springer New York, 1998). The Book of GENESIS: Exploring Realistic Neural Models with the GEneral NEural SImulation System Migliore, M., Cannia, C., Lytton, W. W., Markram, H. & Hines, M. L. Parallel network simulations with NEURON. , 119 (2006). J. Comput. Neurosci. 21 Lytton, W. W. et al. Simulation neurotechnologies for advancing brain research: parallelizing large networks in NEURON. , 2063–2090 (2016). Neural Comput. 28 Valero-Lara, P. et al. cuHinesBatch: Lösning av flera Hines-system på GPU: s mänskliga hjärnprojekt. i Proc. 2017 International Conference on Computational Science 566-575 (IEEE, 2017). Akar, N. A. et al. Arbor — Ett morfologiskt detaljerat neuralt nätverkssimuleringsbibliotek för samtida högpresterande beräkningsarkitekturer. i Proc. 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) 274–282 (IEEE, 2019). Ben-Shalom, R. et al. NeuroGPU: Accelerating multi-compartment, biofysiskt detaljerade neuronsimuleringar på GPU: J. Neurosci. metoder 366, 109400 (2022). Rempe, M. J. & Chopp, D. L. En prediktor-korrigeringsalgoritm för reaktions-diffusionsekvationer förknippade med neural aktivitet på grenade strukturer. SIAM J. Sci. Comput. 28, 2139-2161 (2006). Kozloski, J. & Wagner, J. An ultrascalable solution to large-scale neural tissue simulation. , 15 (2011). Front. Neuroinform. 5 Jayant, K. et al. Targeted intracellular voltage recordings from dendritic spines using quantum-dot-coated nanopipettes. , 335–342 (2017). Nat. Nanotechnol. 12 Palmer, L. M. & Stuart, G. J. Membrane potential changes in dendritic spines during action potentials and synaptic input. , 6897–6903 (2009). J. Neurosci. 29 Nishiyama, J. & Yasuda, R. Biokemisk beräkning för ryggradsstrukturell plasticitet. Neuron 87, 63–75 (2015). Yuste, R. & Bonhoeffer, T. Morfologiska förändringar i dendritiska ryggraden i samband med långvarig synaptisk plasticitet. Holtmaat, A. & Svoboda, K. Experience-dependent structural synaptic plasticity in the mammalian brain. , 647–658 (2009). Nat. Rev. Neurosci. 10 Caroni, P., Donato, F. & Muller, D. Structural plasticity upon learning: regulation and functions. , 478–490 (2012). Nat. Rev. Neurosci. 13 Keck, T. et al. Massiv omstrukturering av neurala kretsar under funktionell omorganisation av vuxen visuell cortex. Nat. Neurosci. 11, 1162 (2008). Hofer, S. B., Mrsic-Flogel, T. D., Bonhoeffer, T. & Hübener, M. Erfarenheten lämnar ett bestående strukturellt spår i kortikala kretsar. Trachtenberg, J. T. et al. Long-term in vivo imaging of experience-dependent synaptic plasticity in adult cortex. , 788–794 (2002). Nature 420 Marik, S. A., Yamahachi, H., McManus, J. N., Szabo, G. & Gilbert, C. D. Axonal dynamics of excitatory and inhibitory neurons in somatosensory cortex. , e1000395 (2010). PLoS Biol. 8 Xu, T. et al. Rapid formation and selective stabilization of synapses for enduring motor memories. , 915–919 (2009). Nature 462 Albarran, E., Raissi, A., Jáidar, O., Shatz, C. J. & Ding, J. B. Enhancing motor learning by increasing the stability of newly formed dendritic spines in the motor cortex. , 3298–3311 (2021). Neuron 109 Branco, T. & Häusser, M. Synaptic integrationsgradienter i enskilda kortikala pyramidala celldendrit. Neuron 69, 885–892 (2011). Major, G., Larkum, M. E. & Schiller, J. Active properties of neocortical pyramidal neuron dendrites. , 1–24 (2013). Annu. Rev. Neurosci. 36 Gidon, A. et al. Dendritic action potentials and computation in human layer 2/3 cortical neurons. , 83–87 (2020). Science 367 Doron, M., Chindemi, G., Muller, E., Markram, H. & Segev, I. Timed synaptic inhibition shapes NMDA spikes, influencing local dendritic processing and global I/O properties of cortical neurons. , 1550–1561 (2017). Cell Rep. 21 Du, K. et al. Cell-typ-specifik hämning av den dendritiska platåpotentialen i striatala spinalprojektionsneuroner. Proc. Natl Acad. Sci. USA 114, E7612–E7621 (2017). Smith, S. L., Smith, I. T., Branco, T. & Häusser, M. Dendritiska spikar ökar stimulus selektivitet i kortikala neuroner in vivo. Nature 503, 115–120 (2013). Xu, N.-l et al. Nonlinear dendritic integration of sensory and motor input during an active sensing task. , 247–251 (2012). Nature 492 Takahashi, N., Oertner, T. G., Hegemann, P. & Larkum, M. E. Active cortical dendrites modulate perception. , 1587–1590 (2016). Science 354 Sheffield, M. E. & Dombeck, D. A. Calcium transient prevalence across the dendritic arbour predicts place field properties. , 200–204 (2015). Nature 517 Markram, H. et al. Reconstruction and simulation of neocortical microcircuitry. , 456–492 (2015). Cell 163 Billeh, Y. N. et al. Systematic integration of structural and functional data into multi-scale models of mouse primary visual cortex. , 388–403 (2020). Neuron 106 Hjorth, J. et al. The microcircuits of striatum in silico. , 202000671 (2020). Proc. Natl Acad. Sci. USA 117 Guerguiev, J., Lillicrap, T. P. & Richards, B. A. Towards deep learning with segregated dendrites. , e22901 (2017). elife 6 Iyer, A. et al. Avoiding catastrophe: active dendrites enable multi-task learning in dynamic environments. , 846219 (2022). Front. Neurorobot. 16 Jones, I. S. & Kording, K. P. Might a single neuron solve interesting machine learning problems through successive computations on its dendritic tree? , 1554–1571 (2021). Neural Comput. 33 Bird, A. D., Jedlicka, P. & Cuntz, H. Dendritic normalisation improves learning in sparsely connected artificial neural networks. , e1009202 (2021). PLoS Comput. Biol. 17 Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. In (ICLR, 2015). 3rd International Conference on Learning Representations (ICLR) Papernot, N., McDaniel, P. & Goodfellow, I. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. Preprint at (2016). https://arxiv.org/abs/1605.07277 Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. , 2278–2324 (1998). Proc. IEEE 86 Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: en ny bilddatasats för benchmarking av maskininlärningsalgoritmer. Förutskrift på http://arxiv.org/abs/1708.07747 (2017). Bartunov, S. et al. Assessing the scalability of biologically-motivated deep learning algorithms and architectures. In (NeurIPS, 2018). Advances in Neural Information Processing Systems 31 (NeurIPS 2018) Rauber, J., Brendel, W. & Bethge, M. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. In (2017). Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning Rauber, J., Zimmermann, R., Bethge, M. & Brendel, W. Foolbox native: fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. , 2607 (2020). J. Open Source Softw. 5 Paszke, A. et al. PyTorch: An imperative style, high-performance deep learning library. In (NeurIPS, 2019). Advances in Neural Information Processing Systems 32 (NeurIPS 2019) He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 770–778 (IEEE, 2016). Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Acknowledgements The authors sincerely thank Dr. Rita Zhang, Daochen Shi and members at NVIDIA for the valuable technical support of GPU computing. This work was supported by the National Key R&D Program of China (No. 2020AAA0130400) to K.D. and T.H., National Natural Science Foundation of China (No. 61088102) to T.H., National Key R&D Program of China (No. 2022ZD01163005) to L.M., Key Area R&D Program of Guangdong Province (No. 2018B030338001) to T.H., National Natural Science Foundation of China (No. 61825101) to Y.T., Swedish Research Council (VR-M-2020-01652), Swedish e-Science Research Centre (SeRC), EU/Horizon 2020 No. 945539 (HBP SGA3), and KTH Digital Futures to J.H.K., J.H., and A.K., Swedish Research Council (VR-M-2021-01995) and EU/Horizon 2020 no. 945539 (HBP SGA3) to S.G. and A.K. Part of the simulations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at PDC KTH partially funded by the Swedish Research Council through grant agreement no. 2018-05973. Denna artikel är tillgänglig i naturen under CC by 4.0 Deed (Attribution 4.0 International) licens. Denna artikel är tillgänglig i naturen under CC by 4.0 Deed (Attribution 4.0 International) licens.