Forskere skabte en GPU-motor, der simulerer hjerneceller 1.500 gange hurtigere

Forfattere af: Yichen Zhang Gan He Lei Ma Xiaofei Liu J. J. Johannes Hjorth Alexander Kozlov Yutao He Shenjian Zhang Jeanette Hellgren Kotaleski Yonghong Tian Sten Grillner Når du Tiejun Huang Forfattere af: Jæger Zhang Han har Læge Ma Xiaofei Liu af J.J. Johannes Hjorth Alexander Kozlov Yutao han af Shenjian Zhang af Jeanette Hellgren Kotaleski af Yonghong Tian Sten Griller Når du Tejun Huang Abstrakte Biophysisk detaljerede multi-division modeller er kraftfulde værktøjer til at udforske beregningsmæssige principper i hjernen og også tjene som en teoretisk ramme for at generere algoritmer til kunstige intelligens (AI) systemer. Men de dyre beregningsmæssige omkostninger alvorligt begrænser applikationer i både neurovidenskab og AI felter. Den største flaskehalse under simulering af detaljerede afdeling modeller er en simulator evne til at løse store systemer af lineære ligninger. Endritisk Hierarkisk cheduling (DHS) metode til markant at fremskynde en sådan proces. Vi teoretisk bevise, at implementeringen af DHS er beregningsmæssigt optimal og nøjagtig. Denne GPU-baserede metode udfører med 2-3 størrelsesordrer højere hastighed end den klassiske serielle Hines metode i den konventionelle CPU-platform. Vi opbygger en DeepDendrite ramme, der integrerer DHS-metoden og GPU computing engine af NEURON simulator og demonstrere applikationer af DeepDendrite i neurovidenskabelige opgaver. Vi undersøger, hvordan rumlige mønstre af spine input påvirker neuronal excitabilitet i en detaljeret human pyramid neuron model med 25.000 spines. Desuden giver vi en kort diskussion om potentialet for DeepDendrite for AI, specifikt fremhæ D H S Introduktion Beskrivelse af kodning og beregningsprincipperne for neuroner er afgørende for neurovidenskaben. Pattedyrs hjerner består af mere end tusindvis af forskellige typer af neuroner med unikke morfologiske og biophysiske egenskaber. I de seneste år har moderne kunstig intelligens (AI) udnyttet dette princip og udviklet kraftfulde værktøjer, såsom kunstige neurale netværk (ANN). Men ud over omfattende beregninger på det enkelt neuron niveau, subcellulære afdelinger, såsom neuronale dendriter, kan også udføre ikke-lineære operationer som uafhængige beregningsenheder. , , , , Desuden kan dendritiske rygsøjler, små fremspring, der tæt dækker dendriter i spinal neuroner, compartmentalisere synaptiske signaler, så de kan adskilles fra deres forældre dendriter ex vivo og in vivo. , , , . 1 2 3 4 5 6 7 8 9 10 11 Simulationer ved hjælp af biologisk detaljerede neuroner giver en teoretisk ramme for at forbinde biologiske detaljer til beregningsprincipper. , giver os mulighed for at modellere neuroner med realistiske dendritiske morfologier, intrinsic ionisk ledningsevne og extrinsic synaptiske input. , som modellerer de biophysiske membranegenskaber af dendritter som passive kabler, der giver en matematisk beskrivelse af, hvordan elektroniske signaler invaderer og spreder sig gennem komplekse neuronale processer.Ved at indarbejde kabelteori med aktive biophysiske mekanismer som ionkanaler, excitatoriske og hæmmende synaptiske strømninger osv., kan en detaljeret multi-compartment model opnå cellulære og subcellulære neuronale beregninger ud over eksperimentelle begrænsninger , . 12 13 12 4 7 Ud over sin dybe indflydelse på neurovidenskaben blev biologisk detaljerede neuronmodeller for nylig brugt til at overbryde kløften mellem neuronale strukturelle og biophysiske detaljer og AI. Den dominerende teknik i det moderne AI-felt er ANN'er bestående af punktneuroner, en analog til biologiske neurale netværk. Selvom ANN'er med "backpropagation-of-error" (backprop) algoritme opnåede bemærkelsesværdig ydeevne i specialiserede applikationer, endda slå top menneskelige professionelle spillere i spil af Go og skak , Den menneskelige hjerne overgår stadig ANN'er i domæner, der involverer mere dynamiske og støjende miljøer. , Nylige teoretiske undersøgelser tyder på, at dendritisk integration er afgørende for at generere effektive læringsalgoritmer, der potentielt overstiger backprop i parallel informationsbehandling. , , Desuden kan en enkelt detaljeret multi-division model lære netværksniveau ikke-lineære beregninger for punkt neuroner ved at justere kun den synaptiske styrke , Det er derfor af høj prioritet at udvide paradigmer i hjerne-lignende AI fra enkelt detaljerede neuronmodeller til store biologisk detaljerede netværk. 14 15 16 17 18 19 20 21 22 En langvarig udfordring ved den detaljerede simuleringstilgang ligger i dens ekstremt høje beregningsomkostninger, som har begrænset dens anvendelse til neurovidenskab og AI. Den største flaskehalse i simulationen er at løse lineære ligninger baseret på grundlæggende teorier om detaljeret modellering. , , For at forbedre effektiviteten reducerer den klassiske Hines-metode tidskompleksiteten for at løse ligninger fra O(n3) til O(n), som er blevet anvendt som kernealgoritmen i populære simulatorer som NEURON. af Genesis Men denne metode bruger en seriel tilgang til at behandle hver afdeling sekventielt.Når en simulering involverer flere biophysisk detaljerede dendriter med dendritiske spin, skaleres den lineære ligningsmatrix ("Hines Matrix") i overensstemmelse hermed med et stigende antal dendriter eller spin (Fig. ), hvilket gør Hines metode ikke længere praktisk, da det udgør en meget tung byrde på hele simuleringen. 12 23 24 25 26 1e En rekonstrueret lag-5 pyramidale neuron model og den matematiske formel anvendes med detaljerede neuron modeller. Arbejdsproces ved numerisk simulering af detaljerede neuronmodeller. Ligningsløsningsfasen er flaskehalsen i simuleringen. Et eksempel på lineære ligninger i simuleringen. Dataafhængighed af Hines-metoden ved løsning af lineære ligninger i af . Størrelsen af Hines matrix skalaer med model kompleksitet. antallet af lineære ligninger system, der skal løses gennemgår en betydelig stigning, når modeller vokser mere detaljeret. Beregningsomkostninger (trin taget i ligningsopløsningsfasen) af den serielle Hines-metode på forskellige typer neuronmodeller. Illustration af forskellige løsningsmetoder. Forskellige dele af en neuron er tildelt flere behandlingsenheder i parallelle metoder (midten, højre), vist med forskellige farver. Beregning af omkostningerne ved tre metoder Løsning af en pyramidemodel med spin. Run tid af forskellige metoder på at løse ligninger for 500 pyramidale modeller med spin. Run tid angiver tidsforbruget af 1 s simulering (løser ligningen 40.000 gange med et tidsskridt på 0,025 ms). p-Hines parallel metode i CoreNEURON (på GPU), Branch-baseret branch-baseret parallel metode (på GPU), DHS Dendritic hierarkisk planlægningsmetode (på GPU). a b c d c e f g h g i I løbet af de seneste årtier er der opnået enorme fremskridt for at fremskynde Hines-metoden ved at bruge parallelle metoder på cellulært plan, som gør det muligt at parallelisere beregningen af forskellige dele i hver celle. , , , , , Men nuværende parallelle metoder på cellulært niveau mangler ofte en effektiv paralleliseringsstrategi eller mangler tilstrækkelig numerisk nøjagtighed i forhold til den oprindelige Hines-metode. 27 28 29 30 31 32 Her udvikler vi et fuldt automatiseret, numerisk nøjagtigt og optimeret simuleringsværktøj, der kan fremskynde beregningseffektiviteten betydeligt og reducere beregningsomkostningerne.Derudover kan dette simuleringsværktøj sømløst anvendes til etablering og test af neurale netværk med biologiske detaljer til maskinlæring og AI-applikationer. Parallel computing teori Vi demonstrerer, at vores algoritme giver optimal planlægning uden tab af præcision. Desuden har vi optimeret DHS til den mest avancerede GPU-chip i øjeblikket ved at udnytte GPU-hukommelseshierarkiet og hukommelsesadgangsmekanismer. ) sammenlignet med den klassiske simulator NEURON med samme nøjagtighed. 33 34 1 25 For at muliggøre detaljerede dendritiske simulationer til brug i AI etablerer vi herefter DeepDendrite-rammen ved at integrere den DHS-embedded CoreNEURON-platform (en optimeret beregningsmotor til NEURON). som simuleringsmotoren og to hjælpemoduler (I/O-modul og læringsmodul), der understøtter dendritiske læringsalgoritmer under simuleringer. DeepDendrite kører på GPU-hardwareplatformen, der understøtter både regelmæssige simuleringsopgaver i neurovidenskab og læringsopgaver i AI. 35 Sidst men ikke mindst præsenterer vi også flere applikationer ved hjælp af DeepDendrite, der tager sigte på nogle kritiske udfordringer inden for neurovidenskab og AI: (1) Vi demonstrerer, hvordan rumlige mønstre af dendritiske spinal input påvirker neuronal aktivitet med neuroner indeholdende spinaler i hele dendritiske træer (full-spinal modeller). DeepDendrite giver os mulighed for at udforske neuronal beregning i en simuleret human pyramidal neuron model med ~25.000 dendritiske spinaler. (2) I diskussionen overvejer vi også potentialet for DeepDendrite i forbindelse med AI, specifikt i at skabe ANN'er med morfologisk detaljerede menneskelige pyramidale neuroner. Alle kildekoder til DeepDendrite, fuldspin-modellerne og den detaljerede dendritiske netværksmodel er offentligt tilgængelige online (se Code Availability).Vores open-source læringsramme kan nemt integreres med andre dendritiske læringsregler, såsom læringsregler for ikke-lineære (fuldt aktive) dendriter Burst-afhængig synaptisk plasticitet , og lære med spike forudsigelse Samlet set giver vores undersøgelse et komplet sæt værktøjer, der har potentiale til at ændre det nuværende økosystem for computational neurovidenskab.Ved at udnytte kraften i GPU-computing forudser vi, at disse værktøjer vil lette system-niveau udforskninger af computationsprincipper for hjernens fine strukturer, samt fremme samspillet mellem neurovidenskab og moderne AI. 21 20 36 Resultater Dendritisk Hierarkisk Planlægning (DHS) Computing ionic currents and solving linear equations are two critical phases when simulating biophysically detailed neurons, which are time-consuming and pose severe computational burdens. Fortunately, computing ionic currents of each compartment is a fully independent process so that it can be naturally parallelized on devices with massive parallel-computing units like GPUs Som følge heraf bliver løsning af lineære ligninger den resterende flaskehal for paralleliseringsprocessen (Fig. ) af 37 1a F For at imødegå denne flaskehalse er der udviklet parallelle metoder på cellulært niveau, som fremskynder enkeltcelleberegning ved at "splitte" en enkelt celle i flere afdelinger, der kan beregnes parallelt. , , Men sådanne metoder er stærkt afhængige af forudgående viden til at generere praktiske strategier om, hvordan man opdeler en enkelt neuron i afdelinger (Fig. Det supplerende Fig. Derfor bliver det mindre effektivt for neuroner med asymmetriske morfologier, f.eks. pyramidale neuroner og Purkinje neuroner. 27 28 38 1 g i 1 Vi sigter mod at udvikle en mere effektiv og præcis parallel metode til simulering af biologisk detaljerede neurale netværk. Først, vi fastlægge kriterierne for nøjagtigheden af en cellulær-niveau parallel metode. , foreslår vi tre betingelser for at sikre, at en parallel metode vil give identiske løsninger som seriel beregning Hines-metoden i henhold til dataafhængigheden i Hines-metoden (se Metoder). 34 Baseret på simuleringsnøjagtigheden og beregningsomkostningerne formulerer vi paralleliseringsproblemet som et matematisk tidsplanlægningsproblem (se Metoder). Parallelle tråde, kan vi beregne på højst Vi har brug for at sikre, at en node kun beregnes, hvis alle dens børneknuder er blevet behandlet; vores mål er at finde en strategi med det mindste antal trin for hele proceduren. k k For at generere en optimal partition, foreslår vi en metode kaldet Dendritic Hierarchical Scheduling (DHS) (teoretisk bevis er præsenteret i metoderne). DHS-metoden omfatter to trin: analyse dendritisk topologi og finde den bedste partition: (1) Givet en detaljeret model, får vi først dens tilsvarende afhængighed træ og beregne dybden af hver node (dybden af en node er antallet af dens forfædre knuder) på træet (Fig. 2) Efter topologianalyse søger vi efter kandidaterne og vælger højst Den dybeste kandidatknuder (en node er en kandidat kun, hvis alle sine børn knuder er blevet behandlet). ) af 2a 2B og C k 2D DHS arbejdsprocesser, DHS arbejdsprocesser Den dybeste kandidat knude hver iteration. Illustration af beregning af knudedybde af en afdelingsmodel. Modellen konverteres først til en træstruktur, hvorefter dybden af hver knude beregnes. Topologi analyse på forskellige neuron modeller. Seks neuroner med forskellige morfologier er vist her. For hver model, soma er valgt som rod af træet, så dybden af knuden øges fra soma (0) til de distale dendriter. Illustration af at udføre DHS på modellen i med fire tråde. Kandidater: knuder, der kan behandles. Udvalgte kandidater: knuder, der vælges af DHS, dvs. Forarbejdede knuder: knuder, der er blevet behandlet før. Paralleliseringsstrategi opnået af DHS efter processen i Hver node er tildelt en af de fire parallelle tråde. DHS reducerer trinene i seriel knudeproduktion fra 14 til 5 ved at distribuere knudepunkter til flere tråde. Relative cost, i.e., the proportion of the computational cost of DHS to that of the serial Hines method, when applying DHS with different numbers of threads on different types of models. a k b c d b k e d f Tag en forenklet model med 15 afdelinger som eksempel, ved hjælp af seriel beregning Hines metode, det tager 14 trin til at behandle alle noder, mens ved hjælp af DHS med fire parallelle enheder kan dele sine noder i fem delsæt (Fig. Fordi noder i samme undergruppe kan behandles parallelt, tager det kun fem trin at behandle alle noder ved hjælp af DHS (Fig. ) af 2D 2e Dernæst anvender vi DHS-metoden på seks repræsentative detaljerede neuronmodeller (valgte fra ModelDB). ) med forskellige tal af tråde (Fig. ): herunder cortical og hippocampal pyramidale neuroner , , Cerebellære Purkinje neuroner Striatale projektionsneuroner (SPN) ), og olfactory pære mitral celler , der dækker de vigtigste primære neuroner i sensoriske, kortikale og subkortikale områder. Vi målte derefter beregningsomkostningerne. Den relative beregningsomkostning her er defineret af andelen af beregningsomkostningerne for DHS til den serielle Hines-metode. De beregningsomkostninger, dvs. antallet af trin, der tages i løsning af ligninger, falder dramatisk med stigende trådtal. For eksempel med 16 tråde er beregningsomkostningerne for DHS 7%-10% i forhold til seriel Hines-metoden. Interessant nok når DHS-metoden de lavere grænser for deres beregningsomkostninger for præsenterede neuroner, når der gives 16 eller endda 8 parallelle tråde (figur. ), foreslår at tilføje flere tråde forbedrer ikke ydeevnen yderligere på grund af afhængighederne mellem afdelinger. 39 2F 40 41 42 43 44 45 2F Sammen genererer vi en DHS-metode, der muliggør automatiseret analyse af den dendritiske topologi og optimal partition til parallel beregning. Det er værd at bemærke, at DHS finder den optimale partition, før simuleringen begynder, og ingen ekstra beregning er nødvendig for at løse ligninger. Hurtigere DHS ved at øge GPU-hukommelsen DHS beregner hver neuron med flere tråde, som forbruger en stor mængde tråde, når der kører neurale netværkssimulationer. til parallel computing I teorien bør mange SP'er på GPU'en understøtte effektiv simulering for store neurale netværk (Fig. Vi observerede dog konsekvent, at effektiviteten af DHS faldt betydeligt, da netværksstørrelsen voksede, hvilket kan skyldes spredt dataopbevaring eller ekstra hukommelsestilgang forårsaget af indlæsning og skrivning af mellemliggende resultater (Fig. til venstre) 3a og b 46 3c 3d GPU architecture and its memory hierarchy. Each GPU contains massive processing units (stream processors). Different types of memory have different throughput. Hver SM indeholder flere streamingprocessorer, registre og L1-cache. Anvendelse af DHS på to neuroner, hver med fire tråde. Memory optimalisering strategi på GPU. Top panel, tråd tildeling og data lagring af DHS, før (venstre) og efter (højre) hukommelse boost. Bottom, et eksempel på et enkelt trin i triangularisering, når man simulerer to neuroner i Processorer sender en dataforespørgsel for at indlæse data for hver tråd fra global hukommelse.Uden hukommelsesboosting (venstre), tager det syv transaktioner at indlæse alle anmodningsdata og nogle ekstra transaktioner for mellemliggende resultater.Med hukommelsesboosting (højre) tager det kun to transaktioner at indlæse alle anmodningsdata, registre bruges til mellemliggende resultater, som yderligere forbedrer hukommelsespredning. Køretid af DHS (32 tråde hver celle) med og uden hukommelse boosting på flere lag 5 pyramidemodeller med spin. Fremskyndelse af hukommelsesforstærkning på multi-layer 5-pyramidemodeller med spin. a b c d d e f Vi løser dette problem ved hjælp af GPU-hukommelsesboosting, en metode til at øge hukommelsesgennemstrømningen ved at udnytte GPU'ens hukommelseshierarki og adgangsmekanisme.Baseret på GPU'ens hukommelsesloading-mekanisme fører efterfølgende tråde ved indlæsning af justerede og successivt lagrede data til en høj hukommelsesgennemstrømning sammenlignet med adgang til scatter-lagrede data, hvilket reducerer hukommelsesgennemstrømningen. , For at opnå høj gennemstrømning justerer vi først nodens beregningsordrer og omarrangerer trådene i henhold til antallet af noder på dem. Derefter permuterer vi dataopbevaring i globalt hukommelse, der er i overensstemmelse med beregningsordrer, dvs. noder, der behandles på samme trin, gemmes successivt i globalt hukommelse. Desuden bruger vi GPU-registre til at gemme mellemliggende resultater, hvilket yderligere styrker hukommelsespredning. Eksemplet viser, at hukommelsesforstærkning kun tager to hukommelsetransaktioner til at indlæse otte anmodningsdata (figur. Desuden eksperimenter på flere tal af pyramidale neuroner med rygsøjler og de typiske neuron modeller (Fig. Det supplerende Fig. ) show that memory boosting achieves a 1.2-3.8 times speedup as compared to the naïve DHS. 46 47 3d 3e og f 2 For omfattende at teste ydeevnen af DHS med GPU hukommelse boost, vi vælge seks typiske neuron modeller og evaluere løbetiden for at løse kabel ligninger på massive tal af hver model (Fig. Vi undersøgte DHS med fire tråde (DHS-4) og seksten tråde (DHS-16) for hver neuron, henholdsvis. Desuden, sammenlignet med den konventionelle serielle Hines metode i NEURON kører med en enkelt tråd af CPU, DHS accelererer simuleringen med 2-3 størrelsesordener (Supplementary Fig. ), samtidig med at den identiske numeriske nøjagtighed bevares i nærvær af tætte spindler (supplementære fig. og ), aktive dendriter (Supplementary Fig. ) og forskellige segmenteringsstrategier (Supplementary Fig. ). 4 4a 3 4 8 7 7 Runtime af løsning af ligninger for en 1 s simulering på GPU (dt = 0,025 ms, 40.000 iterationer i alt). CoreNEURON: den parallelle metode, der anvendes i CoreNEURON; DHS-4: DHS med fire tråde for hver neuron; DHS-16: DHS med 16 tråde for hver neuron. , der Visualisering af partitionen af DHS-4 og DHS-16, hver farve angiver en enkelt tråd. a b c DHS skaber celletype-specifik optimal partitionering For at få indsigt i arbejdsmekanismen i DHS-metoden visualiserede vi partitioneringsprocessen ved at kortlægge afdelinger til hver tråd (hver farve præsenterer en enkelt tråd i figur. Visualiseringen viser, at en enkelt tråd ofte skifter mellem forskellige grene (Fig. Interessant, DHS genererer justerede partitioner i morfologisk symmetriske neuroner såsom den striatale projektionsneuron (SPN) og mitralcellen (Fig. I modsætning hertil genererer det fragmenterede partitioner af morfologisk asymmetriske neuroner som pyramidale neuroner og Purkinje-cellen (Fig. ), hvilket indikerer, at DHS opdeler det neurale træ på individuel afdeling skala (dvs. træ knude) i stedet for gren skala. 4B og C 4b, c 4B og C 4B og C Sammenfattende genererer DHS og hukommelsesforstærkning en teoretisk bevist optimal løsning til løsning af lineære ligninger parallelt med hidtil uset effektivitet. Ved hjælp af dette princip har vi bygget den åbne adgangsplatform DeepDendrite, som neurovidenskabsfolk kan bruge til at implementere modeller uden nogen specifik GPU-programmeringskendskab. Nedenfor viser vi, hvordan vi kan bruge DeepDendrite i neurovidenskabelige opgaver. DHS muliggør spinal-niveau modellering Da dendritiske rygsøjler modtager størstedelen af den excitatoriske input til kortikale og hippocampale pyramidale neuroner, striatale projektionsneuroner osv., er deres morfologier og plasticitet afgørende for regulering af neuronal excitabilitet. , , , , Spine er dog for små ( ~ 1 μm længde) til at måles direkte eksperimentelt med hensyn til spændingsafhængige processer. 10 48 49 50 51 We can model a single spine with two compartments: the spine head where synapses are located and the spine neck that links the spine head to dendrites Teorien forudsiger, at den meget tynde rygsøjle (0,1-0,5 um i diameter) elektronisk isolerer ryghovedet fra sin forældre dendrit, og dermed compartmentaliserer de signaler, der genereres ved ryghovedet. Imidlertid er den detaljerede model med fuldt distribuerede spindler på dendriter ("full-spine model") beregningsmæssigt meget dyr. Spin faktor , i stedet for at modelere alle spirer udtrykkeligt. her, den rygsøjlefaktoren sigter mod at tilnærme rygsøjleeffekten på cellemembranens biophysiske egenskaber . 52 53 F 54 F 54 Inspired by the previous work of Eyal et al. , we investigated how different spatial patterns of excitatory inputs formed on dendritic spines shape neuronal activities in a human pyramidal neuron model with explicitly modeled spines (Fig. ). Noticeably, Eyal et al. employed the spine factor to incorporate spines into dendrites while only a few activated spines were explicitly attached to dendrites (“few-spine model” in Fig. ). The value of spine in their model was computed from the dendritic area and spine area in the reconstructed data. Accordingly, we calculated the spine density from their reconstructed data to make our full-spine model more consistent with Eyal’s few-spine model. With the spine density set to 1.3 μm-1, the pyramidal neuron model contained about 25,000 spines without altering the model’s original morphological and biophysical properties. Further, we repeated the previous experiment protocols with both full-spine and few-spine models. We use the same synaptic input as in Eyal’s work but attach extra background noise to each sample. By comparing the somatic traces (Fig. ) and spike probability (Fig. ) in full-spine and few-spine models, we found that the full-spine model is much leakier than the few-spine model. In addition, the spike probability triggered by the activation of clustered spines appeared to be more nonlinear in the full-spine model (the solid blue line in Fig. ) than in the few-spine model (the dashed blue line in Fig. ). These results indicate that the conventional F-factor method may underestimate the impact of dense spine on the computations of dendritic excitability and nonlinearity. 51 5a F 5a F 5b, c 5d 5d 5d Experiment setup. We examine two major types of models: few-spine models and full-spine models. Few-spine models (two on the left) are the models that incorporated spine area globally into dendrites and only attach individual spines together with activated synapses. In full-spine models (two on the right), all spines are explicitly attached over whole dendrites. We explore the effects of clustered and randomly distributed synaptic inputs on the few-spine models and the full-spine models, respectively. Somatic voltages recorded for cases in . Colors of the voltage curves correspond to , scale bar: 20 ms, 20 mV. Color-coded voltages during the simulation in at specific times. Colors indicate the magnitude of voltage. Somatic spike probability as a function of the number of simultaneously activated synapses (as in Eyal et al.’s work) for four cases in . Background noise is attached. Run time of experiments in with different simulation methods. NEURON: conventional NEURON simulator running on a single CPU core. CoreNEURON: CoreNEURON simulator on a single GPU. DeepDendrite: DeepDendrite on a single GPU. a b a a c b d a e d In the DeepDendrite platform, both full-spine and few-spine models achieved 8 times speedup compared to CoreNEURON on the GPU platform and 100 times speedup compared to serial NEURON on the CPU platform (Fig. ; Supplementary Table ) while keeping the identical simulation results (Supplementary Figs. and ). Therefore, the DHS method enables explorations of dendritic excitability under more realistic anatomic conditions. 5e 1 4 8 Discussion In this work, we propose the DHS method to parallelize the computation of Hines method and we mathematically demonstrate that the DHS provides an optimal solution without any loss of precision. Next, we implement DHS on the GPU hardware platform and use GPU memory boosting techniques to refine the DHS (Fig. ). When simulating a large number of neurons with complex morphologies, DHS with memory boosting achieves a 15-fold speedup (Supplementary Table ) as compared to the GPU method used in CoreNEURON and up to 1,500-fold speedup compared to serial Hines method in the CPU platform (Fig. ; Supplementary Fig. and Supplementary Table ). Furthermore, we develop the GPU-based DeepDendrite framework by integrating DHS into CoreNEURON. Finally, as a demonstration of the capacity of DeepDendrite, we present a representative application: examine spine computations in a detailed pyramidal neuron model with 25,000 spines. Further in this section, we elaborate on how we have expanded the DeepDendrite framework to enable efficient training of biophysically detailed neural networks. To explore the hypothesis that dendrites improve robustness against adversarial attacks , we train our network on typical image classification tasks. We show that DeepDendrite can support both neuroscience simulations and AI-related detailed neural network tasks with unprecedented speed, therefore significantly promoting detailed neuroscience simulations and potentially for future AI explorations. 55 3 1 4 3 1 56 Decades of efforts have been invested in speeding up the Hines method with parallel methods. Early work mainly focuses on network-level parallelization. In network simulations, each cell independently solves its corresponding linear equations with the Hines method. Network-level parallel methods distribute a network on multiple threads and parallelize the computation of each cell group with each thread , . With network-level methods, we can simulate detailed networks on clusters or supercomputers I de seneste år er GPU'er blevet brugt til detaljeret netværkssimulering.Da GPU'en indeholder massive beregningsenheder, er en tråd normalt tildelt en celle i stedet for en cellegruppe. , , . With further optimization, GPU-based methods achieve much higher efficiency in network simulation. However, the computation inside the cells is still serial in network-level methods, so they still cannot deal with the problem when the “Hines matrix” of each cell scales large. 57 58 59 35 60 61 Cellular-level parallel methods further parallelize the computation inside each cell. The main idea of cellular-level parallel methods is to split each cell into several sub-blocks and parallelize the computation of those sub-blocks , . However, typical cellular-level methods (e.g., the “multi-split” method ) pay less attention to the parallelization strategy. The lack of a fine parallelization strategy results in unsatisfactory performance. To achieve higher efficiency, some studies try to obtain finer-grained parallelization by introducing extra computation operations , , or making approximations on some crucial compartments, while solving linear equations , . These finer-grained parallelization strategies can get higher efficiency but lack sufficient numerical accuracy as in the original Hines method. 27 28 28 29 38 62 63 64 Unlike previous methods, DHS adopts the finest-grained parallelization strategy, i.e., compartment-level parallelization. By modeling the problem of “how to parallelize” as a combinatorial optimization problem, DHS provides an optimal compartment-level parallelization strategy. Moreover, DHS does not introduce any extra operation or value approximation, so it achieves the lowest computational cost and retains sufficient numerical accuracy as in the original Hines method at the same time. Dendritic spines are the most abundant microstructures in the brain for projection neurons in the cortex, hippocampus, cerebellum, and basal ganglia. As spines receive most of the excitatory inputs in the central nervous system, electrical signals generated by spines are the main driving force for large-scale neuronal activities in the forebrain and cerebellum , Strukturen af rygsøjlen, med en forstørret rygsøjle og en meget tynd rygsøjle - fører til overraskende høj input impedans på rygsøjlen, som kunne være op til 500 MΩ, kombinerer eksperimentelle data og den detaljerede afdeling modellering tilgang , . Due to such high input impedance, a single synaptic input can evoke a “gigantic” EPSP ( ~ 20 mV) at the spine-head level , , derved øge NMDA strømme og ion kanal strømme i rygsøjlen . However, in the classic single detailed compartment models, all spines are replaced by the coefficient modifying the dendritic cable geometries . This approach may compensate for the leak currents and capacitance currents for spines. Still, it cannot reproduce the high input impedance at the spine head, which may weaken excitatory synaptic inputs, particularly NMDA currents, thereby reducing the nonlinearity in the neuron’s input-output curve. Our modeling results are in line with this interpretation. 10 11 48 65 48 66 11 F 54 On the other hand, the spine’s electrical compartmentalization is always accompanied by the biochemical compartmentalization , , , resulting in a drastic increase of internal [Ca2+], within the spine and a cascade of molecular processes involving synaptic plasticity of importance for learning and memory. Intriguingly, the biochemical process triggered by learning, in turn, remodels the spine’s morphology, enlarging (or shrinking) the spine head, or elongating (or shortening) the spine neck, which significantly alters the spine’s electrical capacity , , , . Such experience-dependent changes in spine morphology also referred to as “structural plasticity”, have been widely observed in the visual cortex , , somatosensory cortex , , motor cortex , hippocampus , and the basal ganglia in vivo. They play a critical role in motor and spatial learning as well as memory formation. However, due to the computational costs, nearly all detailed network models exploit the “F-factor” approach to replace actual spines, and are thus unable to explore the spine functions at the system level. By taking advantage of our framework and the GPU platform, we can run a few thousand detailed neurons models, each with tens of thousands of spines on a single GPU, while maintaining ~100 times faster than the traditional serial method on a single CPU (Fig. ). Therefore, it enables us to explore of structural plasticity in large-scale circuit models across diverse brain regions. 8 52 67 67 68 69 70 71 72 73 74 75 9 76 5e Another critical issue is how to link dendrites to brain functions at the systems/network level. It has been well established that dendrites can perform comprehensive computations on synaptic inputs due to enriched ion channels and local biophysical membrane properties , , . For example, cortical pyramidal neurons can carry out sublinear synaptic integration at the proximal dendrite but progressively shift to supralinear integration at the distal dendrite . Moreover, distal dendrites can produce regenerative events such as dendritic sodium spikes, calcium spikes, and NMDA spikes/plateau potentials , . Such dendritic events are widely observed in mice or even human cortical neurons in vitro, which may offer various logical operations , eller gating funktioner , . Recently, in vivo recordings in awake or behaving mice provide strong evidence that dendritic spikes/plateau potentials are crucial for orientation selectivity in the visual cortex Sensormotorisk integration i whiskersystemet , , and spatial navigation in the hippocampal CA1 region . 5 6 7 77 6 78 6 79 6 79 80 81 82 83 84 85 To establish the causal link between dendrites and animal (including human) patterns of behavior, large-scale biophysically detailed neural circuit models are a powerful computational tool to realize this mission. However, running a large-scale detailed circuit model of 10,000-100,000 neurons generally requires the computing power of supercomputers. It is even more challenging to optimize such models for in vivo data, as it needs iterative simulations of the models. The DeepDendrite framework can directly support many state-of-the-art large-scale circuit models , , , which were initially developed based on NEURON. Moreover, using our framework, a single GPU card such as Tesla A100 could easily support the operation of detailed circuit models of up to 10,000 neurons, thereby providing carbon-efficient and affordable plans for ordinary labs to develop and optimize their own large-scale detailed models. 86 87 88 Recent works on unraveling the dendritic roles in task-specific learning have achieved remarkable results in two directions, i.e., solving challenging tasks such as image classification dataset ImageNet with simplified dendritic networks , and exploring full learning potentials on more realistic neuron , . However, there lies a trade-off between model size and biological detail, as the increase in network scale is often sacrificed for neuron-level complexity , , . Moreover, more detailed neuron models are less mathematically tractable and computationally expensive . 20 21 22 19 20 89 21 There has also been progress in the role of active dendrites in ANNs for computer vision tasks. Iyer et al. . proposed a novel ANN architecture with active dendrites, demonstrating competitive results in multi-task and continual learning. Jones and Kording used a binary tree to approximate dendrite branching and provided valuable insights into the influence of tree structure on single neurons’ computational capacity. Bird et al. . proposed a dendritic normalization rule based on biophysical behavior, offering an interesting perspective on the contribution of dendritic arbor structure to computation. While these studies offer valuable insights, they primarily rely on abstractions derived from spatially extended neurons, and do not fully exploit the detailed biological properties and spatial information of dendrites. Further investigation is needed to unveil the potential of leveraging more realistic neuron models for understanding the shared mechanisms underlying brain computation and deep learning. 90 91 92 In response to these challenges, we developed DeepDendrite, a tool that uses the Dendritic Hierarchical Scheduling (DHS) method to significantly reduce computational costs and incorporates an I/O module and a learning module to handle large datasets. With DeepDendrite, we successfully implemented a three-layer hybrid neural network, the Human Pyramidal Cell Network (HPC-Net) (Fig. ). This network demonstrated efficient training capabilities in image classification tasks, achieving approximately 25 times speedup compared to training on a traditional CPU-based platform (Fig. • Supplerende bord ). 6a, b 6f 1 The illustration of the Human Pyramidal Cell Network (HPC-Net) for image classification. Images are transformed to spike trains and fed into the network model. Learning is triggered by error signals propagated from soma to dendrites. Training with mini-batch. Multiple networks are simulated simultaneously with different images as inputs. The total weight updates ΔW are computed as the average of ΔWi from each network. Comparison of the HPC-Net before and after training. Left, the visualization of hidden neuron responses to a specific input before (top) and after (bottom) training. Right, hidden layer weights (from input to hidden layer) distribution before (top) and after (bottom) training. Workflow of the transfer adversarial attack experiment. We first generate adversarial samples of the test set on a 20-layer ResNet. Then use these adversarial samples (noisy images) to test the classification accuracy of models trained with clean images. Prediction accuracy of each model on adversarial samples after training 30 epochs on MNIST (left) and Fashion-MNIST (right) datasets. Løbetid for træning og testning for HPC-Net. Batchstørrelsen er indstillet til 16. venstre, løbetid for træning en æra. højre, løbetid for testning. Parallel NEURON + Python: træning og testning på en enkelt CPU med flere kerner, ved hjælp af 40-procesparallel NEURON til at simulere HPC-Net og ekstra Python-kode til at understøtte mini-batch træning. DeepDendrite: træning og testning af HPC-Net på en enkelt GPU med DeepDendrite. a b c d e f Additionally, it is widely recognized that the performance of Artificial Neural Networks (ANNs) can be undermined by adversarial attacks —intentionally engineered perturbations devised to mislead ANNs. Intriguingly, an existing hypothesis suggests that dendrites and synapses may innately defend against such attacks . Our experimental results utilizing HPC-Net lend support to this hypothesis, as we observed that networks endowed with detailed dendritic structures demonstrated some increased resilience to transfer adversarial attacks compared to standard ANNs, as evident in MNIST af Fashion-MNIST Datasæt (Fig. ). This evidence implies that the inherent biophysical properties of dendrites could be pivotal in augmenting the robustness of ANNs against adversarial interference. Nonetheless, it is essential to conduct further studies to validate these findings using more challenging datasets such as ImageNet . 93 56 94 95 96 6d, e 97 In conclusion, DeepDendrite has shown remarkable potential in image classification tasks, opening up a world of exciting future directions and possibilities. To further advance DeepDendrite and the application of biologically detailed dendritic models in AI tasks, we may focus on developing multi-GPU systems and exploring applications in other domains, such as Natural Language Processing (NLP), where dendritic filtering properties align well with the inherently noisy and ambiguous nature of human language. Challenges include testing scalability in larger-scale problems, understanding performance across various tasks and domains, and addressing the computational complexity introduced by novel biological principles, such as active dendrites. By overcoming these limitations, we can further advance the understanding and capabilities of biophysically detailed dendritic neural networks, potentially uncovering new advantages, enhancing their robustness against adversarial attacks and noisy inputs, and ultimately bridging the gap between neuroscience and modern AI. Methods Simulation with DHS CoreNEURON simulator ( ) uses the NEURON architecture and is optimized for both memory usage and computational speed. We implement our Dendritic Hierarchical Scheduling (DHS) method in the CoreNEURON environment by modifying its source code. All models that can be simulated on GPU with CoreNEURON can also be simulated with DHS by executing the following command: 35 https://github.com/BlueBrain/CoreNeuron 25 coreneuron_exec -d /path/to/models -e time --cell-permute 3 --cell-nthread 16 --gpu The usage options are as in Table . 1 Accuracy of the simulation using cellular-level parallel computation To ensure the accuracy of the simulation, we first need to define the correctness of a cellular-level parallel algorithm to judge whether it will generate identical solutions compared with the proven correct serial methods, like the Hines method used in the NEURON simulation platform. Based on the theories in parallel computing , a parallel algorithm will yield an identical result as its corresponding serial algorithm, if and only if the data process order in the parallel algorithm is consistent with data dependency in the serial method. The Hines method has two symmetrical phases: triangularization and back-substitution. By analyzing the serial computing Hines method , finder vi, at dens dataafhængighed kan formuleres som en træstruktur, hvor knudepunkterne på træet repræsenterer afdelingerne i den detaljerede neuronmodel. I triangulariseringsprocessen afhænger værdien af hver node af sine børneknuder. ). Thus, we can compute nodes on different branches in parallel as their values are not dependent. 34 55 1D Based on the data dependency of the serial computing Hines method, we propose three conditions to make sure a parallel method will yield identical solutions as the serial computing Hines method: (1) The tree morphology and initial values of all nodes are identical to those in the serial computing Hines method; (2) In the triangularization phase, a node can be processed if and only if all its children nodes are already processed; (3) In the back-substitution phase, a node can be processed only if its parent node is already processed. Once a parallel computing method satisfies these three conditions, it will produce identical solutions as the serial computing method. Computational cost of cellular-level parallel computing method To theoretically evaluate the run time, i.e., efficiency, of the serial and parallel computing methods, we introduce and formulate the concept of computational cost as follows: given a tree and threads (basic computational units) to perform triangularization, parallel triangularization equals to divide the node set of into subsets, i.e., = { , , … } where the size of each subset | | ≤ , i.e., at most nodes can be processed each step since there are only threads. The process of the triangularization phase follows the order: → → … → , and nodes in the same subset can be processed in parallel. So, we define | | (the size of set , i.e., here) as the computational cost of the parallel computing method. In short, we define the computational cost of a parallel method as the number of steps it takes in the triangularization phase. Because the back-substitution is symmetrical with triangularization, the total cost of the entire solving equation phase is twice that of the triangularization phase. T k V T n V V1 V2 Vn Vi k k k V1 V2 Vn Vi V V n Matematiske planlægningsproblemer Based on the simulation accuracy and computational cost, we formulate the parallelization problem as a mathematical scheduling problem: Given a tree Det er { , } and a positive integer , where is the node-set and is the edge set. Define partition ( ) = { , ... ... }, | | ≤ , 1 ≤ ≤ n, where | | indicates the cardinal number of subset , i.e., the number of nodes in , and for each node ∈ , all its children nodes { | ∈children( )} must in a previous subset , where 1 ≤ < . Our goal is to find an optimal partition ( ) whose computational cost | ( )| is minimal. T V E k V E P V V1 V2 Vn Vi k i Vi Vi Vi v Vi c c v Vj j i P* V P* V Here subset consists of all nodes that will be computed at -th step (Fig. ), so | | ≤ indicates that we can compute nodes each step at most because the number of available threads is . The restriction “for each node ∈ , all its children nodes { | ∈children( )} must in a previous subset , where 1 ≤ < ” indicates that node can be processed only if all its child nodes are processed. Vi i 2e Vi k k k v Vi c c v Vj j i v DHS implementation We aim to find an optimal way to parallelize the computation of solving linear equations for each neuron model by solving the mathematical scheduling problem above. To get the optimal partition, DHS first analyzes the topology and calculates the depth ( ) for all nodes ∈ . Then, the following two steps will be executed iteratively until every node ∈ is assigned to a subset: (1) find all candidate nodes and put these nodes into candidate set . A node is a candidate only if all its child nodes have been processed or it does not have any child nodes. (2) if | | ≤ , i.e., the number of candidate nodes is smaller or equivalent to the number of available threads, remove all nodes in and put them into , otherwise, remove deepest nodes from and add them to subset . Label these nodes as processed nodes (Fig. ). After filling in subset , go to step (1) to fill in the next subset . d v v V v V Q Q k Q V*i k Q Vi 2d Vi Vi+1 Correctness proof for DHS After applying DHS to a neural tree = { , }, we get a partition ( ) = { , , … }, | | ≤ 1 ≤ ≤ . Nodes in the same subset vil blive beregnet i parallel, idet steps to perform triangularization and back-substitution, respectively. We then demonstrate that the reordering of the computation in DHS will result in a result identical to the serial Hines method. T V E P V V1 V2 Vn Vi k i n Vi n The partition ( ) obtained from DHS decides the computation order of all nodes in a neural tree. Below we demonstrate that the computation order determined by ( ) satisfies the correctness conditions. ( ) is obtained from the given neural tree . Operations in DHS do not modify the tree topology and values of tree nodes (corresponding values in the linear equations), so the tree morphology and initial values of all nodes are not changed, which satisfies condition 1: the tree morphology and initial values of all nodes are identical to those in serial Hines method. In triangularization, nodes are processed from subset to . As shown in the implementation of DHS, all nodes in subset are selected from the candidate set , and a node can be put into Kun hvis alle sine børneknuder er blevet behandlet. are in { , , … }, meaning that a node is only computed after all its children have been processed, which satisfies condition 2: in triangularization, a node can be processed if and only if all its child nodes are already processed. In back-substitution, the computation order is the opposite of that in triangularization, i.e., from to . As shown before, the child nodes of all nodes in are in { , , … }, so parent nodes of nodes in are in { , , … }, which satisfies condition 3: in back-substitution, a node can be processed only if its parent node is already processed. P V P V P V T V1 Vn Vi Q Q Vi V1 V2 Vi-1 Vn V1 Vi V1 V2 Vi-1 Vi Vi+1 Vi+2 Vn Optimality proof for DHS The idea of the proof is that if there is another optimal solution, it can be transformed into our DHS solution without increasing the number of steps the algorithm requires, thus indicating that the DHS solution is optimal. For hver undergruppe in ( ), DHS moves (tråd nummer) dybeste knuder fra det tilsvarende kandidat sæt To af Hvis antallet af knuder i is smaller than Flyt alle noder fra to . To simplify, we introduce , indicating the depth sum of deepest nodes in . All subsets in ( ) satisfy the max-depth criteria (Supplementary Fig. ): . We then prove that selecting the deepest nodes in each iteration makes Hvis der er en optimal partition Det er { , , … } containing subsets that do not satisfy the max-depth criteria, we can modify the subsets in ( ) so that all subsets consist of the deepest nodes from and the number of subsets ( | ( )|) remain the same after modification. Vi P V k Qi Vi Qi k Qi Vi af k Qi P V 6a P(V) P*(V) V*1 V*2 V*s P* V Q P* V Without any loss of generalization, we start from the first subset not satisfying the criteria, i.e., . There are two possible cases that will make not satisfy the max-depth criteria: (1) | | < and there exist some valid nodes in that are not put to ; (2) | | = but nodes in are not the deepest nodes in . V*i V*i V * I k Qi V*i V*i k V*i k Qi For case (1), because some candidate nodes are not put to , these nodes must be in the subsequent subsets. As | | , we can move the corresponding nodes from the subsequent subsets to , which will not increase the number of subsets and make satisfy the criteria (Supplementary Fig. , top). For case (2), | | = , these deeper nodes that are not moved from the candidate set into must be added to subsequent subsets (Supplementary Fig. , bottom). These deeper nodes can be moved from subsequent subsets to through the following method. Assume that after filling , is picked and one of the -th deepest nodes is still in , thus will be put into a subsequent subset ( > Vi flytter først fra to + , then modify subset + as follows: if | + | ≤ and none of the nodes in + is the parent of node , stop modifying the latter subsets. Otherwise, modify + as follows (Supplementary Fig. ): if the parent node of is in + , move this parent node to + ; else move the node with minimum depth from + to + Efter tilpasning , modify subsequent subsets + , + , … with the same strategy. Finally, move from to . V*i V*i < k V*i V*i 6b V*i k Qi V*i 6b V*i V*i v k v’ Qi v’ V*j j i v V * I V*i 1 V*i 1 V*i 1 k V*i 1 v V*i 1 6c v V*i 1 V * I 2 V * I 1 V*i 2 V*i V*i 1 V * I 2 V*j-1 v’ V * J V*i With the modification strategy described above, we can replace all shallower nodes in with the -th deepest node in and keep the number of subsets, i.e., | ( )| the same after modification. We can modify the nodes with the same strategy for all subsets in ( ) that do not contain the deepest nodes. Finally, all subsets ∈ ( ) can satisfy the max-depth criteria, and | ( )| does not change after modifying. V*i k Qi P* V P* V V*i P* V P* V In conclusion, DHS generates a partition ( ), and all subsets ∈ ( ) satisfy the max-depth condition: . For any other optimal partition ( ) we can modify its subsets to make its structure the same as ( ), i.e., each subset consists of the deepest nodes in the candidate set, and keep | ( ) the same after modification. So, the partition ( ) obtained from DHS is one of the optimal partitions. P V Vi P V P* V P V P* V | P V GPU implementation and memory boosting To achieve high memory throughput, GPU utilizes the memory hierarchy of (1) global memory, (2) cache, (3) register, where global memory has large capacity but low throughput, while registers have low capacity but high throughput. We aim to boost memory throughput by leveraging the memory hierarchy of GPU. GPU bruger SIMT (Single-Instruction, Multiple-Thread) arkitektur. Warps er de grundlæggende planlægningsenheder på GPU (en warp er en gruppe af 32 parallelle tråde). En warp udfører den samme instruktion med forskellige data for forskellige tråde . Correctly ordering the nodes is essential for this batching of computation in warps, to make sure DHS obtains identical results as the serial Hines method. When implementing DHS on GPU, we first group all cells into multiple warps based on their morphologies. Cells with similar morphologies are grouped in the same warp. We then apply DHS on all neurons, assigning the compartments of each neuron to multiple threads. Because neurons are grouped into warps, the threads for the same neuron are in the same warp. Therefore, the intrinsic synchronization in warps keeps the computation order consistent with the data dependency of the serial Hines method. Finally, threads in each warp are aligned and rearranged according to the number of compartments. 46 Når en warp indlæser forudindstillede og successivt lagrede data fra den globale hukommelse, kan den gøre fuld brug af cachen, hvilket fører til høj hukommelsesgennemstrømning, mens adgangen til scatter-lagrede data ville reducere hukommelsesgennemstrømningen. Efter afdeling tildeling og tråde omarrangement, vi permutere data i den globale hukommelse for at gøre det i overensstemmelse med beregningsordrer, så warps kan indlæse successivt lagrede data, når du kører programmet. Full-spine og few-spine biophysiske modeller We used the published human pyramidal neuron . The membrane capacitance m = 0.44 μF cm-2, membrane resistance m = 48,300 Ω cm2, and axial resistivity a = 261.97 Ω cm. In this model, all dendrites were modeled as passive cables while somas were active. The leak reversal potential l = -83.1 mV. Ion channels such as Na+ and K+ were inserted on soma and initial axon, and their reversal potentials were Na = 67,6 mV K = -102 mV respectively. All these specific parameters were set the same as in the model of Eyal, et al. , for more details please refer to the published model (ModelDB, access No. 238347). 51 c r r E E E 51 In the few-spine model, the membrane capacitance and maximum leak conductance of the dendritic cables 60 μm away from soma were multiplied by a spine factor to approximate dendritic spines. In this model, spine was set to 1.9. Only the spines that receive synaptic inputs were explicitly attached to dendrites. F F I den fulde rygsøjle model, alle rygsøjler var udtrykkeligt knyttet til dendrit. Vi beregnet rygsøjle tæthed med den rekonstruerede neuron i Eyal, et al. . The spine density was set to 1.3 μm-1, and each cell contained 24994 spines on dendrites 60 μm away from the soma. 51 The morphologies and biophysical mechanisms of spines were the same in few-spine and full-spine models. The length of the spine neck neck = 1.35 μm and the diameter neck = 0.25 μm, whereas the length and diameter of the spine head were 0.944 μm, i.e., the spine head area was set to 2.8 μm2. Both spine neck and spine head were modeled as passive cables, with the reversal potential = -86 mV. The specific membrane capacitance, membrane resistance, and axial resistivity were the same as those for dendrites. L D El Synaptic inputs We investigated neuronal excitability for both distributed and clustered synaptic inputs. All activated synapses were attached to the terminal of the spine head. For distributed inputs, all activated synapses were randomly distributed on all dendrites. For clustered inputs, each cluster consisted of 20 activated synapses that were uniformly distributed on a single randomly-selected compartment. All synapses were activated simultaneously during the simulation. AMPA-based and NMDA-based synaptic currents were simulated as in Eyal et al.’s work. AMPA conductance was modeled as a double-exponential function and NMDA conduction as a voltage-dependent double-exponential function. For the AMPA model, the specific rise and decay were set to 0.3 and 1.8 ms. For the NMDA model, rise and decay were set to 8.019 and 34.9884 ms, respectively. The maximum conductance of AMPA and NMDA were 0.73 nS and 1.31 nS. τ τ τ τ Background noise We attached background noise to each cell to simulate a more realistic environment. Noise patterns were implemented as Poisson spike trains with a constant rate of 1.0 Hz. Each pattern started at start = 10 ms and lasted until the end of the simulation. We generated 400 noise spike trains for each cell and attached them to randomly-selected synapses. The model and specific parameters of synaptic currents were the same as described in , except that the maximum conductance of NMDA was uniformly distributed from 1.57 to 3.275, resulting in a higher AMPA to NMDA ratio. t Synaptic Inputs Exploring neuronal excitability We investigated the spike probability when multiple synapses were activated simultaneously. For distributed inputs, we tested 14 cases, from 0 to 240 activated synapses. For clustered inputs, we tested 9 cases in total, activating from 0 to 12 clusters respectively. Each cluster consisted of 20 synapses. For each case in both distributed and clustered inputs, we calculated the spike probability with 50 random samples. Spike probability was defined as the ratio of the number of neurons fired to the total number of samples. All 1150 samples were simulated simultaneously on our DeepDendrite platform, reducing the simulation time from days to minutes. Performing AI tasks with the DeepDendrite platform Conventional detailed neuron simulators lack two functionalities important to modern AI tasks: (1) alternately performing simulations and weight updates without heavy reinitialization and (2) simultaneously processing multiple stimuli samples in a batch-like manner. Here we present the DeepDendrite platform, which supports both biophysical simulating and performing deep learning tasks with detailed dendritic models. DeepDendrite consists of three modules (Supplementary Fig. ): (1) an I/O module; (2) a DHS-based simulating module; (3) a learning module. When training a biophysically detailed model to perform learning tasks, users first define the learning rule, then feed all training samples to the detailed model for learning. In each step during training, the I/O module picks a specific stimulus and its corresponding teacher signal (if necessary) from all training samples and attaches the stimulus to the network model. Then, the DHS-based simulating module initializes the model and starts the simulation. After simulation, the learning module updates all synaptic weights according to the difference between model responses and teacher signals. After training, the learned model can achieve performance comparable to ANN. The testing phase is similar to training, except that all synaptic weights are fixed. 5 HPC-Net model Image classification is a typical task in the field of AI. In this task, a model should learn to recognize the content in a given image and output the corresponding label. Here we present the HPC-Net, a network consisting of detailed human pyramidal neuron models that can learn to perform image classification tasks by utilizing the DeepDendrite platform. HPC-Net has three layers, i.e., an input layer, a hidden layer, and an output layer. The neurons in the input layer receive spike trains converted from images as their input. Hidden layer neurons receive the output of input layer neurons and deliver responses to neurons in the output layer. The responses of the output layer neurons are taken as the final output of HPC-Net. Neurons between adjacent layers are fully connected. For each image stimulus, we first convert each normalized pixel to a homogeneous spike train. For pixel with coordinates ( ) in the image, the corresponding spike train has a constant interspike interval ISI( ) (in ms) which is determined by the pixel value ( ) as shown in Eq. ( ). x, y τ x, y p x, y 1 In our experiment, the simulation for each stimulus lasted 50 ms. All spike trains started at 9 + ISI ms and lasted until the end of the simulation. Then we attached all spike trains to the input layer neurons in a one-to-one manner. The synaptic current triggered by the spike arriving at time is given by τ T0 where is the post-synaptic voltage, the reversal potential syn = 1 mV, the maximum synaptic conductance max = 0.05 μS, and the time constant = 0.5 ms. v E g τ Neurons in the input layer were modeled with a passive single-compartment model. The specific parameters were set as follows: membrane capacitance m = 1.0 μF cm-2, membrane resistance m = 104 Ω cm2, axial resistivity a = 100 Ω cm, reversal potential of passive compartment l = 0 mV. c r r E The hidden layer contains a group of human pyramidal neuron models, receiving the somatic voltages of input layer neurons. The morphology was from Eyal, et al. , and all neurons were modeled with passive cables. The specific membrane capacitance m = 1.5 μF cm-2, membrane resistance m = 48,300 Ω cm2, axial resistivity a = 261.97 Ω cm, and the reversal potential of all passive cables l = 0 mV. Input neurons could make multiple connections to randomly-selected locations on the dendrites of hidden neurons. The synaptic current activated by the -th synapse of the -th input neuron on neuron ’s dendrite is defined as in Eq. ( ), where is the synaptic conductance, is the synaptic weight, is the ReLU-like somatic activation function, and is the somatic voltage of the -th input neuron at time . 51 c r r E k i j 4 gijk Wijk i t Neurons in the output layer were also modeled with a passive single-compartment model, and each hidden neuron only made one synaptic connection to each output neuron. All specific parameters were set the same as those of the input neurons. Synaptic currents activated by hidden neurons are also in the form of Eq. ( ). 4 Image classification with HPC-Net For each input image stimulus, we first normalized all pixel values to 0.0-1.0. Then we converted normalized pixels to spike trains and attached them to input neurons. Somatic voltages of the output neurons are used to compute the predicted probability of each class, as shown in equation , where is the probability of -th class predicted by the HPC-Net, is the average somatic voltage from 20 ms to 50 ms of the -th output neuron, and indicates the number of classes, which equals the number of output neurons. The class with the maximum predicted probability is the final classification result. In this paper, we built the HPC-Net with 784 input neurons, 64 hidden neurons, and 10 output neurons. 6 pi i i C Synaptic plasticity rules for HPC-Net Inspired by previous work , we use a gradient-based learning rule to train our HPC-Net to perform the image classification task. The loss function we use here is cross-entropy, given in Eq. ( ), where is the predicted probability for class , indicates the actual class the stimulus image belongs to, = 1 if input image belongs to class , and = 0 if not. 36 7 pi i yi yi i yi Når vi træner HPC-Net, beregner vi opdateringen for vægt (the synaptic weight of the -th synapse connecting neuron to neuron ) at each time step. After the simulation of each image stimulus, is updated as shown in Eq. ( ): Wijk k i j Wijk 8 Here is the learning rate, is the update value at time , , are somatic voltages of neuron and respectively, is the -th synaptic current activated by neuron on neuron , its synaptic conductance, is the transfer resistance between the -th connected compartment of neuron on neuron ’s dendrite to neuron ’s soma, s = 30 ms, e = 50 ms are start time and end time for learning respectively. For output neurons, the error term can be computed as shown in Eq. ( ). For hidden neurons, the error term is calculated from the error terms in the output layer, given in Eq. ( ). t vj vi i j Iijk k i j gijk rijk k i j j t t 10 11 Since all output neurons are single-compartment, equals to the input resistance of the corresponding compartment, . Transfer and input resistances are computed by NEURON. Mini-batch training is a typical method in deep learning for achieving higher prediction accuracy and accelerating convergence. DeepDendrite also supports mini-batch training. When training HPC-Net with mini-batch size batch, we make batch copies of HPC-Net. During training, each copy is fed with a different training sample from the batch. DeepDendrite first computes the weight update for each copy separately. After all copies in the current training batch are done, the average weight update is calculated and weights in all copies are updated by this same amount. N N Robustness against adversarial attack with HPC-Net To demonstrate the robustness of HPC-Net, we tested its prediction accuracy on adversarial samples and compared it with an analogous ANN (one with the same 784-64-10 structure and ReLU activation, for fair comparison in our HPC-Net each input neuron only made one synaptic connection to each hidden neuron). We first trained HPC-Net and ANN with the original training set (original clean images). Then we added adversarial noise to the test set and measured their prediction accuracy on the noisy test set. We used the Foolbox , to generate adversarial noise with the FGSM method . ANN was trained with PyTorch , and HPC-Net was trained with our DeepDendrite. For fairness, we generated adversarial noise on a significantly different network model, a 20-layer ResNet . The noise level ranged from 0.02 to 0.2. We experimented on two typical datasets, MNIST and Fashion-MNIST . Results show that the prediction accuracy of HPC-Net is 19% and 16.72% higher than that of the analogous ANN, respectively. 98 99 93 100 101 95 96 Reporting summary Further information on research design is available in the linked to this article. Nature Portfolio Reporting Summary Data availability The data that support the findings of this study are available within the paper, Supplementary Information and Source Data files provided with this paper. The source code and data that used to reproduce the results in Figs. – are available at . The MNIST dataset is publicly available at . The Fashion-MNIST dataset is publicly available at . are provided with this paper. 3 6 https://github.com/pkuzyc/DeepDendrite http://yann.lecun.com/exdb/mnist https://github.com/zalandoresearch/fashion-mnist Source data Code availability The source code of DeepDendrite as well as the models and code used to reproduce Figs. – I denne undersøgelse er de tilgængelige på . 3 6 https://github.com/pkuzyc/DeepDendrite References McCulloch, W. S. & Pitts, W. En logisk beregning af de ideer, der er immanente i nervøs aktivitet. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. , 436–444 (2015). Nature 521 Poirazi, P., Brannon, T. & Mel, B. W. Aritmetisk af undertærskel synaptisk sum i en model CA1 pyramidale celle. Neuron 37, 977–987 (2003). London, M. & Häusser, M. Dendritic computation. , 503–532 (2005). Annu. Rev. Neurosci. 28 Branco, T. & Häusser, M. Den enkeltdendritiske gren som en grundlæggende funktionel enhed i nervesystemet. Curr. Opin. Neurobiol. 20, 494–502 (2010). Stuart, G. J. & Spruston, N. Dendritisk integration: 60 års fremskridt. Nat. Neurosci. 18, 1713–1721 (2015). Poirazi, P. & Papoutsi, A. Illuminating dendritic function with computational models. , 303–321 (2020). Nat. Rev. Neurosci. 21 Yuste, R. & Denk, W. Dendritiske rygsøjler som grundlæggende funktionelle enheder af neuronal integration. Engert, F. & Bonhoeffer, T. Dendritic spine changes associated with hippocampal long-term synaptic plasticity. , 66–70 (1999). Nature 399 Yuste, R. Dendritiske rygsøjler og distribuerede kredsløb. Neuron 71, 772–781 (2011). Yuste, R. Elektrisk compartmentalisering i dendritiske rygsøjler. Annu. Rev. Neurosci. 36, 429–449 (2013). Rall, W. Branching dendritiske træer og motoneuron membran resistivitet. Exp. Neurol. 1, 491-527 (1959). Segev, I. & Rall, W. Computational undersøgelse af en excitabel dendritisk rygsøjle. J. Neurophysiol. 60, 499-523 (1988). Silver, D. et al. Mastering spillet af gå med dybe neurale netværk og træ søgning. Nature 529, 484-489 (2016). Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. , 1140–1144 (2018). Science 362 McCloskey, M. & Cohen, N. J. Catastrophic interference in connectionist networks: the sequential learning problem. , 109–165 (1989). Psychol. Learn. Motiv. 24 Fransk, R. M. Katastrofal glemsel i forbindelsesnetværk. Trends Cogn. Sci. 3, 128–135 (1999). Naud, R. & Sprekeler, H. Sparse bursts optimize information transmission in a multiplexed neural code. , E6329–E6338 (2018). Proc. Natl Acad. Sci. USA 115 Sacramento, J., Costa, R. P., Bengio, Y. & Senn, W. Dendritic cortical microcircuits approximate the backpropagation algorithm. in (NeurIPS*,* 2018). Advances in Neural Information Processing Systems 31 (NeurIPS 2018) Payeur, A., Guerguiev, J., Zenke, F., Richards, B. A. & Naud, R. Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits. , 1010–1019 (2021). Nat. Neurosci. 24 Bicknell, B. A. & Häusser, M. En synaptisk læringsregel til udnyttelse af ikke-lineær dendritisk beregning. Neuron 109, 4001–4017 (2021). Moldwin, T., Kalmenson, M. & Segev, I. Den gradient clusteron: en model neuron, der lærer at løse klassificeringsopgaver via dendritiske ikke-lineariteter, strukturel plasticitet og gradient nedstigning. Hodgkin, A. L. & Huxley, A. F. A quantitative description of membrane current and Its application to conduction and excitation in nerve. , 500–544 (1952). J. Physiol. 117 Rall, W. Theory of physiological properties of dendrites. , 1071–1092 (1962). Ann. N. Y. Acad. Sci. 96 Hines, M. L. og Carnevale, N. T. Neural Comput. 9, 1179-1209 (1997). Bower, J. M. & Beeman, D. i The Book of GENESIS: Exploring Realistic Neural Models with the General Neural Simulation System (eds Bower, J. M. & Beeman, D.) 17-27 (Springer New York, 1998). Hines, M. L., Eichner, H. & Schürmann, F. Neuron splittelse i computerbundne parallelle netværkssimulationer muliggør runtime skalering med dobbelt så mange processorer. Hines, M. L., Markram, H. & Schürmann, F. Fully implicit parallel simulation of single neurons. , 439–448 (2008). J. Comput. Neurosci. 25 Ben-Shalom, R., Liberman, G. & Korngreen, A. Acceleration af afdelingsmodellering på en grafisk processorenhed. Tsuyuki, T., Yamamoto, Y. & Yamazaki, T. Effektiv numerisk simulering af neuronmodeller med rumlig struktur på grafiske behandlingsenheder. i Proc. 2016 International Conference on Neural Information Processing (eds Hirose894Akiraet al.) 279–285 (Springer International Publishing, 2016). Vooturi, D. T., Kothapalli, K. & Bhalla, U. S. Parallelizing Hines Matrix Solver in Neuron Simulations on GPU. In 388–397 (IEEE, 2017). Proc. IEEE 24th International Conference on High Performance Computing (HiPC) Huber, F. Effektiv træopløsning til hines-matricer på GPU'en. Præprint på https://arxiv.org/abs/1810.12742 (2018). Korte, B. & Vygen, J. Kombineret optimeringsteori og algoritmer 6 edn (Springer, 2018). Gebali, F. (Wiley, 2011). Algorithms and Parallel Computing Kumbhar, P. et al. CoreNEURON: En optimeret beregningsmotor til NEURON-simulatoren. Front. Neuroinform. 13, 63 (2019). Urbanczik, R. & Senn, W. Learning by the dendritic prediction of somatic spiking. , 521–528 (2014). Neuron 81 Ben-Shalom, R., Aviv, A., Razon, B. & Korngreen, A. Optimering af ionkanalmodeller ved hjælp af en parallel genetisk algoritme på grafiske processorer. Mascagni, M. En paralleliseringsalgoritme til beregning af løsninger til vilkårligt forgrenede kabelneuronmodeller. McDougal, R. A. et al. Tyve år af modelDB og videre: opbygning af væsentlige modelleringsværktøjer til fremtiden for neurovidenskab. Migliore, M., Messineo, L. & Ferrante, M. Dendritic Ih selectively blocks temporal summation of unsynchronized distal inputs in CA1 pyramidal neurons. , 5–13 (2004). J. Comput. Neurosci. 16 Hemond, P. et al. Forskellige klasser af pyramidale celler udviser gensidigt eksklusive skydemønstre i hippocampalområdet CA3b. Hay, E., Hill, S., Schürmann, F., Markram, H. & Segev, I. Modeller af neokortikale lag 5b pyramidale celler indfanger en bred vifte af dendritiske og perisomatiske aktive egenskaber. PLoS Comput. Biol. 7, e1002107 (2011). Masoli, S., Solinas, S. & D’Angelo, E. Handlingspotentiale behandling i en detaljeret purkinje celle model afslører en kritisk rolle for axonal compartmentalization. Lindroos, R. et al. Basal ganglia neuromodulation over multiple temporal and structural scales—simulations of direct pathway MSNs investigate the fast onset of dopaminergic effects and predict the role of Kv4.2. , 3 (2018). Front. Neural Circuits 12 Migliore, M. et al. Synaptic klynger fungerer som lugtoperatorer i den olfactory pære. Proc. Natl Acad. Sci. USa 112, 8499-8504 (2015). NVIDIA. CUDA C++ Programmeringsvejledning. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (2021). NVIDIA. CUDA C++ Best Practices Guide. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html (2021). Harnett, M. T., Makara, J. K., Spruston, N., Kath, W. L. & Magee, J. C. Synaptic forstærkning af dendritiske rygsøjler forbedrer input kooperativitet. Natur 491, 599–602 (2012). Chiu, C. Q. et al. Compartmentalization af GABAergic hæmning af dendritiske rygsøjler. Videnskab 340, 759-762 (2013). Tønnesen, J., Katona, G., Rózsa, B. & Nägerl, U. V. Spine hals plasticitet regulerer compartmentalisering af synapser. Nat. Neurosci. 17, 678–685 (2014). Eyal, G. et al. Menneskelige kortikale pyramidale neuroner: fra rygsøjler til spikes via modeller. Front. Cell. Neurosci. 12, 181 (2018). Koch, C. & Zador, A. Funktionen af dendritiske spindler: apparater, der subservere biokemisk snarere end elektrisk compartmentalisering. Koch, C. Dendritiske spindler. i Biophysics of Computation (Oxford University Press, 1999). Rapp, M., Yarom, Y. & Segev, I. The impact of parallel fiber background activity on the cable properties of cerebellar purkinje cells. , 518–533 (1992). Neural Comput. 4 Hines, M. Effektiv beregning af forgrenede nerve ligninger. Int. J. Bio-Med. Comput. 15, 69-76 (1984). Nayebi, A. & Ganguli, S. Biologisk inspireret beskyttelse af dybe netværk mod modstridende angreb. Forprint på https://arxiv.org/abs/1703.09202 (2017). Goddard, N. H. & Hood, G. Large-Scale Simulation Using Parallel GENESIS. In (eds Bower James M. & Beeman David) 349-379 (Springer New York, 1998). The Book of GENESIS: Exploring Realistic Neural Models with the GEneral NEural SImulation System Migliore, M., Cannia, C., Lytton, W. W., Markram, H. & Hines, M. L. Parallel netværkssimulationer med NEURON. Lytton, W. W. et al. Simulation neurotechnologies for advancing brain research: parallelizing large networks in NEURON. , 2063–2090 (2016). Neural Comput. 28 Valero-Lara, P. et al. cuHinesBatch: Løsning af flere Hines systemer på GPU'er menneskelige hjerne projekt. i Proc. 2017 International Conference on Computational Science 566-575 (IEEE, 2017). Akar, N. A. et al. Arbor—A morphologically-detailed neural network simulation library for contemporary high-performance computing architectures. In 274–282 (IEEE, 2019). Proc. 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) Ben-Shalom, R. et al. NeuroGPU: Accelerating multi-compartment, biophysisk detaljerede neuron simulationer på GPU'er. Rempe, M. J. & Chopp, D. L. A predictor-corrector algorithm for reaction-diffusion equations associated with neural activity on branched structures. , 2139–2161 (2006). SIAM J. Sci. Comput. 28 Kozloski, J. & Wagner, J. En ultraskalerbar løsning til storskala neurale vævssimulering. Front. Neuroinform. 5, 15 (2011). Jayant, K. et al. Målrettede intracellulære spændingsoptagelser fra dendritiske spindler ved hjælp af quantum-dot-belagte nanopipetter. Nat. Nanotechnol. 12, 335–342 (2017). Palmer, L. M. & Stuart, G. J. Membrane potential changes in dendritic spines during action potentials and synaptic input. , 6897–6903 (2009). J. Neurosci. 29 Nishiyama, J. & Yasuda, R. Biokemisk beregning for rygsøjlens strukturelle plasticitet. Neuron 87, 63–75 (2015). Yuste, R. & Bonhoeffer, T. Morfologiske ændringer i dendritiske rygsøjler forbundet med langsigtet synaptisk plasticitet. Holtmaat, A. & Svoboda, K. Erfaringsafhængig strukturel synaptisk plasticitet i pattedyrs hjerne. Nat. Rev. Neurosci. 10, 647–658 (2009). Caroni, P., Donato, F. & Muller, D. Strukturel plasticitet ved læring: regulering og funktioner. Nat. Rev. Neurosci. 13, 478-490 (2012). Keck, T. et al. Massiv omstrukturering af neurale kredsløb under funktionel omorganisering af den voksne visuelle cortex. Nat. Neurosci. 11, 1162 (2008). Hofer, S. B., Mrsic-Flogel, T. D., Bonhoeffer, T. & Hübener, M. Erfaring efterlader et varigt strukturelt spor i kortikale kredsløb. Trachtenberg, J. T. et al. Langsigtet in vivo-billeddannelse af erfaringsafhængig synaptisk plasticitet i voksen cortex. Marik, S. A., Yamahachi, H., McManus, J. N., Szabo, G. & Gilbert, C. D. Axonal dynamics of excitatory and inhibitory neurons in somatosensory cortex. , e1000395 (2010). PLoS Biol. 8 Xu, T. et al. Hurtig dannelse og selektiv stabilisering af synapser for varige motoriske minder. natur 462, 915-919 (2009). Albarran, E., Raissi, A., Jáidar, O., Shatz, C. J. & Ding, J. B. Forbedring af motorisk læring ved at øge stabiliteten af nydannede dendritiske rygsøjler i motorbarken. Neuron 109, 3298-3311 (2021). Branco, T. & Häusser, M. Synaptic integration gradients in single cortical pyramidal cell dendrites. , 885–892 (2011). Neuron 69 Major, G., Larkum, M. E. & Schiller, J. Active properties of neocortical pyramidal neuron dendrites. , 1–24 (2013). Annu. Rev. Neurosci. 36 Gidon, A. et al. Dendritiske virkningspotentialer og beregning i humant lag 2/3 kortikale neuroner. videnskab 367, 83-87 (2020). Doron, M., Chindemi, G., Muller, E., Markram, H. & Segev, I. Timed synaptic inhibition shapes NMDA spikes, influencing local dendritic processing and global I/O properties of cortical neurons. , 1550–1561 (2017). Cell Rep. 21 Du, K. et al. Cell-type-specific inhibition of the dendritic plateau potential in striatal spiny projection neurons. , E7612–E7621 (2017). Proc. Natl Acad. Sci. USA 114 Smith, S. L., Smith, I. T., Branco, T. & Häusser, M. Dendritic spikes enhance stimulus selectivity in cortical neurons in vivo. , 115–120 (2013). Nature 503 Xu, N.-l et al. Nonlinear dendritic integration of sensory and motor input during an active sensing task. , 247–251 (2012). Nature 492 Takahashi, N., Oertner, T. G., Hegemann, P. & Larkum, M. E. Active cortical dendrites modulate perception. , 1587–1590 (2016). Science 354 Sheffield, M. E. & Dombeck, D. A. Calcium transient prevalence across the dendritic arbour predicts place field properties. , 200–204 (2015). Nature 517 Markram, H. et al. Reconstruction and simulation of neocortical microcircuitry. , 456–492 (2015). Cell 163 Billeh, Y. N. et al. Systematic integration of structural and functional data into multi-scale models of mouse primary visual cortex. , 388–403 (2020). Neuron 106 Hjorth, J. et al. The microcircuits of striatum in silico. , 202000671 (2020). Proc. Natl Acad. Sci. USA 117 Guerguiev, J., Lillicrap, T. P. & Richards, B. A. Towards deep learning with segregated dendrites. , e22901 (2017). elife 6 Iyer, A. et al. Avoiding catastrophe: active dendrites enable multi-task learning in dynamic environments. , 846219 (2022). Front. Neurorobot. 16 Jones, I. S. & Kording, K. P. Might a single neuron solve interesting machine learning problems through successive computations on its dendritic tree? , 1554–1571 (2021). Neural Comput. 33 Bird, A. D., Jedlicka, P. & Cuntz, H. Dendritic normalisation improves learning in sparsely connected artificial neural networks. , e1009202 (2021). PLoS Comput. Biol. 17 Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. In (ICLR, 2015). 3rd International Conference on Learning Representations (ICLR) Papernot, N., McDaniel, P. & Goodfellow, I. Overførbarhed i maskinlæring: fra fænomener til black-box-angreb ved hjælp af modstanderprøver. Præprint på https://arxiv.org/abs/1605.07277 (2016). Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. , 2278–2324 (1998). Proc. IEEE 86 Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. Preprint at (2017). http://arxiv.org/abs/1708.07747 Bartunov, S. et al. Assessing the scalability of biologically-motivated deep learning algorithms and architectures. In (NeurIPS, 2018). Advances in Neural Information Processing Systems 31 (NeurIPS 2018) Rauber, J., Brendel, W. & Bethge, M. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. In (2017). Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning Rauber, J., Zimmermann, R., Bethge, M. & Brendel, W. Foolbox native: fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. , 2607 (2020). J. Open Source Softw. 5 Paszke, A. et al. PyTorch: An imperative style, high-performance deep learning library. In (NeurIPS, 2019). Advances in Neural Information Processing Systems 32 (NeurIPS 2019) He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 770–778 (IEEE, 2016). Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Anerkendelser Forfatterne takker oprigtigt Dr. Rita Zhang, Daochen Shi og medlemmer af NVIDIA for den værdifulde tekniske støtte til GPU-computing. Dette arbejde blev støttet af National Key R&D Program of China (No. 2020AAA0130400) til K.D. og T.H., National Natural Science Foundation of China (No. 61825102) til Y.T., Swedish Research Council (VR-M-2020-01652), Swedish e-Science Research Centre (SeRC), EU/Horizon 2020 No. 945539 (HBP SGA3), KTH og Digital Futures til J.H.K., J.K., A.H., PDIC og Swedish til Y.T., Swedish Research Council (VR-M-2020-01652), Swedish Research Center (SeRC), EU/Hor Denne artikel er tilgængelig i naturen under CC by 4.0 Deed (Attribution 4.0 International) licens. Denne artikel er tilgængelig i naturen under CC by 4.0 Deed (Attribution 4.0 International) licens.