Científics construeixen un motor de GPU que simula cèl·lules cerebrals 1.500 vegades més ràpides

Els autors: Càritas Zhang Gana el Lei Ma Xiaofei Liu J. J. Johannes Hjorth Alexandre Kozlov Júpiter i ell Shenjian Zhang Jeanette Hellgren Kotaleski Yonghong Tian Estàtua Grillner Quan el Tiejun Huang Els autors: Càritas Zhang Gana el Lleó Ma Xiaofei Liu J. J. Joan Hjorth Alexandre Kozlov Júpiter i ell Xenjiang Zhang Jeanette Hellgren de Kotaleski Càritas Tian Estàtua Grillner Quan el Xàtiva Huang Abstracció Els models biofísicament detallats de múltiples compartiments són eines poderoses per explorar els principis computacionals del cervell i també serveixen com a marc teòric per generar algorismes per a sistemes d'intel·ligència artificial (IA). No obstant això, el cost computacional car limita severament les aplicacions tant en els camps de la neurociència com en la IA. La barrera major durant la simulació de models de compartiments detallats és la capacitat d'un simulador per resoldre grans sistemes d'equacions lineals. endònim jeràrquica El mètode de cheduling (DHS) per accelerar notablement aquest procés. Teòricament demostrem que la implementació del DHS és computacionalment òptima i precisa. Aquest mètode basat en GPU funciona amb 2-3 ordres de magnitud més ràpid que el mètode de Hines en sèrie clàssic en la plataforma CPU convencional. Construïm un marc DeepDendrite, que integra el mètode DHS i el motor de computació GPU del simulador NEURON i demostra les aplicacions de DeepDendrite en tasques de neurociència. Investiguem com els patrons espacials de les entrades d'esquena afecten l'excitabilitat neuronal en un model de neurona piramidal humà detallat amb 25.000 espines. A més, proporcionem una breu discussió D H S Introducció Desxifrar els principis de codificació i computació de les neurones és essencial per a la neurociència. Els cervells de mamífers estan compostos per més de milers de tipus diferents de neurones amb propietats morfològiques i biofísiques úniques. En els últims anys, la intel·ligència artificial moderna (IA) ha utilitzat aquest principi i ha desenvolupat eines poderoses, com les xarxes neuronals artificials (ANN). No obstant això, a més de càlculs exhaustius al nivell de les neurones individuals, els compartiments subcel·lulars, com les dendrites neuronals, també poden realitzar operacions no lineals com a unitats computacionals independents. , , , , A més, les espines dendrítiques, petites protrusions que cobreixen densament les dendrites en les neurones dendrítiques, poden compartimentar els senyals sinàptics, permetent-los ser separats de les seves dendrites parentals ex vivo i in vivo. , , , . 1 2 3 4 5 6 7 8 9 10 11 Les simulacions utilitzant neurones biològicament detallades proporcionen un marc teòric per vincular els detalls biològics als principis computacionals. , ens permet modelar neurones amb morfologies dendrítiques realistes, conductació iònica intrínseca i entrades sinàptiques extrínseques.La columna vertebral del model detallat multipartit, és a dir, les dendrites, es basa en la clàssica teoria del cable , que modela les propietats de la membrana biofísica de les dendrites com a cables passius, proporcionant una descripció matemàtica de com els senyals electrònics envaeixen i es propaguen a través de processos neuronals complexos. Mitjançant la incorporació de la teoria del cable amb mecanismes biofísics actius com canals iònics, corrents sinàptics excitatoris i inhibidors, etc., un model detallat multipartit pot aconseguir càlculs neuronals cel·lulars i subcel·lulars més enllà de les limitacions experimentals , . 12 13 12 4 7 A més del seu profund impacte en la neurociència, els models de neurones biològicament detallats s'han utilitzat recentment per trencar la bretxa entre els detalls estructurals i biofísics neuronals i la IA. La tècnica predominant en el camp de la IA moderna és ANNs que consisteixen en neurones puntals, un analògic de les xarxes neuronals biològiques. , El cervell humà encara supera els ANN en dominis que impliquen entorns més dinàmics i sorollosos. , Estudis teòrics recents suggereixen que la integració dendrítica és crucial en la generació d'algoritmes d'aprenentatge eficients que superen potencialment el backprop en el processament d'informació paral·lel. , , A més, un únic model detallat multipartit pot aprendre càlculs no lineals a nivell de xarxa per a les neurones de punt ajustant només la força sinàptica. , Per tant, és d'alta prioritat expandir els paradigmes en la IA similar al cervell des de models de neurones individuals detallats fins a xarxes biològicament detallades a gran escala. 14 15 16 17 18 19 20 21 22 Un repte de llarga data de l'enfocament de la simulació detallada rau en el seu cost computacional extremadament alt, que ha limitat severament la seva aplicació a la neurociència i la IA. El principal obstacle de la simulació és resoldre equacions lineals basades en les teories fonamentals de la modelització detallada. , , Per millorar l'eficiència, el mètode clàssic de Hines redueix la complexitat del temps per resoldre equacions d'O(n3) a O(n), que s'ha aplicat àmpliament com l'algoritme principal en simuladors populars com NEURON. i la genètica No obstant això, aquest mètode utilitza un enfocament en sèrie per processar cada compartiment seqüencialment.Quan una simulació involucra múltiples dendrites biofísicament detallats amb espines dendrítiques, la matriu d'equació lineal ("Hines Matrix") escala en conseqüència amb un nombre creixent de dendrites o espines (Fig. ), fent que el mètode de Hines ja no sigui pràctic, ja que suposa una càrrega molt pesada sobre tota la simulació. 12 23 24 25 26 1E Un model de neurona piramidal de la capa 5 reconstruït i la fórmula matemàtica utilitzada amb models de neurones detallats. Flux de treball en la simulació numèrica de models de neurones detallats. La fase de solució d'equacions és la barrera a la simulació. Un exemple d'equacions lineals en la simulació. Dependència de dades del mètode Hines en la resolució d'equacions lineals . el La mida de la matriu de Hines escala amb la complexitat del model. El nombre de sistemes d'equacions lineals a resoldre pateix un augment significatiu quan els models creixen més detallats. Cost computacional (pasos presos en la fase de resolució d'equacions) del mètode de Hines en diferents tipus de models de neurones. Il·lustració de diferents mètodes de resolució. Les diferents parts d'un neuró s'assignen a múltiples unitats de processament en mètodes paral·lels (centre, dret), mostrats amb diferents colors. En el mètode sèrie (esquerra), tots els compartiments es computen amb una unitat. El cost computacional de tres mètodes Resoldre equacions d'un model piramidal amb espines. El temps d'execució indica el consum de temps de la simulació d'1 s (resoldre l'equació 40.000 vegades amb un pas de temps de 0,025 ms). mètode paral·lel p-Hines en CoreNEURON (en GPU), mètode paral·lel basat en branques basat en branques (en GPU), mètode d'ordenació jeràrquica dendrítica DHS (en GPU). a b c d c e f g h g i Durant les últimes dècades, s'ha aconseguit un progrés enorme per accelerar el mètode Hines mitjançant l'ús de mètodes paral·lels a nivell cel·lular, que permet paral·lelitzar el càlcul de diferents parts en cada cèl·lula. , , , , , No obstant això, els mètodes paral·lels actuals a nivell cel·lular sovint manquen d'una estratègia de paral·lelització eficient o manquen de suficient precisió numèrica en comparació amb el mètode original de Hines. 27 28 29 30 31 32 Aquí, desenvolupem una eina de simulació totalment automàtica, numèricament precisa i optimitzada que pot accelerar significativament l'eficiència computacional i reduir el cost computacional.A més, aquesta eina de simulació es pot adoptar sense problemes per establir i provar xarxes neuronals amb detalls biològics per a aplicacions d'aprenentatge automàtic i IA. Críticament, formulem el càlcul paral·lel del mètode Hines com un problema de programació matemàtica i generem un mètode de planificació jeràrquica dendrítica (DHS) basat en l'optimització combinatòria Teoria de la computació paral·lela Demostrem que el nostre algoritme proporciona una planificació òptima sense pèrdua de precisió.A més, hem optimitzat el DHS per al xip de GPU més avançat en l'actualitat aprofitant la jerarquia de memòria de GPU i els mecanismes d'accés a la memòria. ) comparat amb el simulador clàssic NEURON Mantenir la mateixa precisió. 33 34 1 25 Per permetre simulacions dendrítiques detallades per a ús en la IA, a continuació, establim el marc DeepDendrite integrant la plataforma CoreNEURON (un motor de computació optimitzat per a NEURON) embedded per DHS. com el motor de simulació i dos mòduls auxiliars (modul I/O i mòdul d'aprenentatge) que donen suport als algoritmes dendrítics d'aprenentatge durant les simulacions. DeepDendrite s'executa en la plataforma de maquinari de la GPU, donant suport tant a tasques regulars de simulació en neurociència com a tasques d'aprenentatge en IA. 35 Finalment, també presentem diverses aplicacions utilitzant DeepDendrite, dirigides a alguns reptes crítics en la neurociència i la IA: (1) Demostrem com els patrons espacials de les entrades dendrítiques de la columna vertebral afecten les activitats neuronals amb neurones que contenen espines al llarg dels arbres dendrítics (models de columna vertebral completa). DeepDendrite ens permet explorar la computació neuronal en un model de neurona piramidal humana simulada amb ~25.000 espines dendrítiques. (2) En la discussió també considerem el potencial de DeepDendrite en el context de la IA, específicament, en la creació d'ANNs amb neurones piramidals humanes morfològicament detallades. Tot el codi font per a DeepDendrite, els models d'aprenentatge dendrític complet i el model de xarxa dendrític detallat estan públicament disponibles en línia (vegeu Code Availability).El nostre marc d'aprenentatge de codi obert es pot integrar fàcilment amb altres regles dendrítiques d'aprenentatge, com ara regles d'aprenentatge per a dendrites no lineals (full-active) Plàsticitat sinàptica explosiu-dependent Aprenentatge amb la predicció de Spike En general, el nostre estudi proporciona un conjunt complet d'eines que tenen el potencial de canviar l'ecosistema actual de la comunitat de la neurociència computacional.Amb l'aprofitament del poder de la computació de GPU, esperem que aquestes eines facilitin exploracions a nivell de sistema dels principis computacionals de les estructures fines del cervell, així com promoguin la interacció entre la neurociència i la intel·ligència artificial moderna. 21 20 36 Resultats Planificació jeràrquica dendrítica (DHS) La computació de corrents iònics i la resolució d'equacions lineals són dues fases crítiques en la simulació de neurones biofísicament detallades, que requereixen temps i suposen greus càrregues computacionals. Afortunadament, la computació de corrents iònics de cada compartiment és un procés completament independent perquè es pugui paral·lelitzar naturalment en dispositius amb unitats massives de computació paral·lela com ara GPUs Com a conseqüència, la resolució d'equacions lineals es converteix en la barrera residual per al procés de paral·lelització (Fig. ) i 37 1a F Per abordar aquesta barrera, s'han desenvolupat mètodes paral·lels a nivell cel·lular, que acceleren el càlcul de cèl·lules individuals mitjançant la "divisió" d'una sola cèl·lula en diversos compartiments que es poden calcular en paral·lel. , , No obstant això, aquests mètodes depenen molt del coneixement previ per generar estratègies pràctiques sobre com dividir un sol neuró en compartiments (Fig. · Figura complementària. Per tant, esdevé menys eficient per a les neurones amb morfologies asimètriques, per exemple, les neurones piramidals i les neurones de Purkinje. 27 28 38 1g i 1 Volem desenvolupar un mètode paral·lel més eficient i precís per a la simulació de xarxes neuronals biològicament detallades. Primer, establim els criteris per a l'exactitud d'un mètode paral·lel a nivell cel·lular. , proposem tres condicions per assegurar-se que un mètode paral·lel produirà solucions idèntiques com el mètode de computació en sèrie Hines segons la dependència de dades en el mètode Hines (veure mètodes). 34 Basant-nos en la precisió de la simulació i el cost computacional, formulem el problema de paral·lelització com un problema de programació matemàtica (veure mètodes). En el cas de l’extensió, es pot calcular el màxim No obstant això, necessitem assegurar-nos que un node només es computa si tots els seus nodes infantils han estat processats; el nostre objectiu és trobar una estratègia amb el nombre mínim de passos per a tot el procediment. k k Per generar una partició òptima, proposem un mètode anomenat Dendritic Hierarchical Scheduling (DHS) (la prova teòrica es presenta en els Mètodes). ), which results in a hierarchical schedule order. The DHS method includes two steps: analyzing dendritic topology and finding the best partition: (1) Given a detailed model, we first obtain its corresponding dependency tree and calculate the depth of each node (the depth of a node is the number of its ancestor nodes) on the tree (Fig. (2) Després de l'anàlisi de la topologia, busquem els candidats i seleccionem el màxim nodes candidats més profunds (un node és un candidat només si tots els seus nodes fills han estat processats). ) i 2a 2B i C k 2D Flux de treball DHS. processos DHS Els nodes més profunds de cada iteració. Il·lustració del càlcul de la profunditat del node d'un model compartimental. El model es converteix primer en una estructura d'arbre i després es calcula la profunditat de cada node. Anàlisi de topologia en diferents models de neurones. Seus neurones amb morfologies diferents es mostren aquí. Per a cada model, la soma es selecciona com a arrel de l'arbre de manera que la profunditat del node augmenta de la soma (0) a les dendrites distals. Il·lustració de l'execució de DHS en el model en amb quatre fils. candidats: nodes que es poden processar. candidats seleccionats: nodes que són seleccionats pel DHS, és a dir, el Els nodes processats: nodes que s'han processat abans. Estratègia de paral·lelització obtinguda pel DHS després del procés en Cada node està assignat a un dels quatre fils paral·lels.DHS redueix els passos de processament de node sèrie de 14 a 5 mitjançant la distribució de nodes a múltiples fils. Cost relatiu, és a dir, la proporció del cost computacional de DHS a la del mètode Hines en sèrie, quan s'aplica DHS amb diferents nombres de filaments en diferents tipus de models. a k b c d b k e d f Take a simplified model with 15 compartments as an example, using the serial computing Hines method, it takes 14 steps to process all nodes, while using DHS with four parallel units can partition its nodes into five subsets (Fig. ): {{9,10,12,14}, {1,7,11,13}, {2,3,4,8}, {6}, {5}}. Atès que els nodes del mateix subconjunt poden ser processats en paral·lel, només es necessiten cinc passos per processar tots els nodes utilitzant DHS (Fig. ). 2D 2E A continuació, aplicem el mètode DHS a sis models de neurones representatius detallats (seleccionats de ModelDB). ) amb diferents nombres de trets (Fig. ): incloent les neurones piramidals corticals i hipocampals , , , cerebellar Purkinje neurons , striatal projection neurons (SPN ), i les cèl·lules mitrals olfactives , covering the major principal neurons in sensory, cortical and subcortical areas. We then measured the computational cost. The relative computational cost here is defined by the proportion of the computational cost of DHS to that of the serial Hines method. The computational cost, i.e., the number of steps taken in solving equations, drops dramatically with increasing thread numbers. For example, with 16 threads, the computational cost of DHS is 7%-10% as compared to the serial Hines method. Intriguingly, the DHS method reaches the lower bounds of their computational cost for presented neurons when given 16 or even 8 parallel threads (Fig. ), suggerint que afegir més fils no millora encara més el rendiment a causa de les dependències entre els compartiments. 39 2F 40 41 42 43 44 45 2F Junts, generem un mètode DHS que permet l'anàlisi automatitzada de la topologia dendrítica i la partició òptima per al càlcul paral·lel. Val la pena assenyalar que DHS troba la partició òptima abans de començar la simulació, i no es necessita cap càlcul addicional per resoldre les equacions. Speeding up DHS by GPU memory boosting DHS computa cada neuró amb múltiples filaments, que consumeix una gran quantitat de filaments quan s'executen simulacions de xarxes neuronals. Unitats de processament gràfics (GPU) consisteixen en unitats de processament massives (és a dir, processadors de streaming, SPs, FIG. Per a la computació paral·lela En teoria, molts SPs en la GPU haurien de donar suport a la simulació eficient per a xarxes neuronals a gran escala (Fig. No obstant això, hem observat constantment que l'eficiència del DHS va disminuir significativament quan la mida de la xarxa va créixer, el que podria resultar de l'emmagatzematge de dades dispers o l'accés a memòria addicional causat per carregar i escriure resultats intermedis (Fig. A l’esquerra 3a i B 46 3C 3D GPU architecture and its memory hierarchy. Each GPU contains massive processing units (stream processors). Different types of memory have different throughput. Arquitectura de multiprocessadors de flux (SM).Cada SM conté múltiples processadors de flux, registres i memòria cau L1. Aplicant DHS a dos neurones, cadascun amb quatre filaments. Durant la simulació, cada filament s'executa en un processador de flux. Memory optimization strategy on GPU. Top panel, thread assignment and data storage of DHS, before (left) and after (right) memory boosting. Bottom, an example of a single step in triangularization when simulating two neurons in Els processadors envien una sol·licitud de dades per carregar dades per a cada fil de la memòria global. Sense augment de la memòria (esquerra), es necessiten set transaccions per carregar totes les dades de la sol·licitud i algunes transaccions addicionals per obtenir resultats intermedis. Amb l'augment de la memòria (dreta), només es necessiten dues transaccions per carregar totes les dades de la sol·licitud, els registres s'utilitzen per obtenir resultats intermedis, que milloren encara més el rendiment de la memòria. Temps d'execució de DHS (32 fils cada cèl·lula) amb i sense memòria augmentant en models piramidals de múltiples capes 5 amb espines. Accelerar l'augment de la memòria en els models piramidals de 5 capes múltiples amb espines. a b c d d e f Resolem aquest problema mitjançant l'augment de la memòria de la GPU, un mètode per augmentar el rendiment de la memòria aprofitant la jerarquia de la memòria de la GPU i el mecanisme d'accés. Basat en el mecanisme de càrrega de la memòria de la GPU, els fils successius que carreguen les dades alineades i emmagatzemades successivament condueixen a un rendiment de la memòria alt en comparació amb l'accés a les dades emmagatzemades a dispersió, el que redueix el rendiment de la memòria. , Per aconseguir un alt rendiment, primer alineem les ordres de computació dels nodes i rearrangem els fils segons el nombre de nodes en ells. Després permutem l'emmagatzematge de dades en la memòria global, consistent amb les ordres de computació, és a dir, els nodes que es processen en el mateix pas s'emmagatzemen successivament en la memòria global. A més, utilitzem registres de GPU per emmagatzemar resultats intermedis, reforçant encara més el rendiment de la memòria. L'exemple mostra que l'augment de la memòria només requereix dues transaccions de memòria per carregar vuit dades de sol·licitud (Fig. A més, experiments sobre nombres múltiples de neurones piramidals amb espines i els models típics de neurones (Fig. · Figura complementària. ) show that memory boosting achieves a 1.2-3.8 times speedup as compared to the naïve DHS. 46 47 3D 3r i f 2 Per provar de forma exhaustiva el rendiment del DHS amb l'augment de la memòria de la GPU, seleccionem sis models de neurones típics i avaluem el temps d'execució de la resolució d'equacions de cable en nombres massius de cada model (Fig. ). vam examinar DHS amb quatre fils (DHS-4) i setze fils (DHS-16) per a cada neurona, respectivament. Comparat amb el mètode GPU en CoreNEURON, DHS-4 i DHS-16 poden accelerar aproximadament 5 i 15 vegades, respectivament (Fig. A més, en comparació amb el mètode convencional de Sèrie Hines en NEURON que s'executa amb un sol fil de la CPU, DHS accelera la simulació per 2-3 ordres de magnitud (Fig. ), mantenint la exactitud numèrica idèntica en presència d'espines denses (Figues complementàries. and ), dendrites actives (Fig. ) i diferents estratègies de segmentació (Fig. ). 4 4a 3 4 8 7 7 Run time of solving equations for a 1 s simulation on GPU (dt = 0.025 ms, 40,000 iterations in total). CoreNEURON: the parallel method used in CoreNEURON; DHS-4: DHS with four threads for each neuron; DHS-16: DHS with 16 threads for each neuron. , Visualització de la partició per DHS-4 i DHS-16, cada color indica un sol fil. Durant el càlcul, cada fil intercanvia entre diferents branques. a b c DHS crea la partició òptima específica del tipus de cel·la Per obtenir una visió del mecanisme de treball del mètode DHS, vam visualitzar el procés de partició mitjançant la cartografia de compartiments a cada filferro (cada color presenta un filferro únic a la Figura. La visualització mostra que un sol fil freqüentment canvia entre diferents branques (Fig. ). Interestingly, DHS generates aligned partitions in morphologically symmetric neurons such as the striatal projection neuron (SPN) and the Mitral cell (Fig. Per contra, genera particions fragmentades de neurones morfològicament asimètriques com les neurones piramidals i la cèl·lula de Purkinje (Fig. ), indicant que DHS divideix l'arbre neural a escala de compartiment individual (és a dir, node d'arbre) en lloc d'escala de branca. 4B i C 4b, c 4B i C 4B i C En resum, el DHS i la millora de la memòria generen una solució òptima teòricament demostrada per resoldre equacions lineals en paral·lel amb eficiència sense precedents. Utilitzant aquest principi, hem construït la plataforma DeepDendrite d'accés obert, que els neurocientífics poden utilitzar per implementar models sense cap coneixement específic de programació de GPU. A continuació, mostrem com podem utilitzar DeepDendrite en tasques de neurociència. DHS enables spine-level modelling Com que les espines dendrítiques reben la major part de l'entrada excitadora a les neurones piramidals corticals i hipocampals, les neurones de projecció striatal, etc., les seves morfologies i plasticitat són crucials per a la regulació de l'excitabilitat neuronal. , , , , . However, spines are too small ( ~ 1 μm length) to be directly measured experimentally with regard to voltage-dependent processes. Thus, theoretical work is critical for the full understanding of the spine computations. 10 48 49 50 51 Podem modelar una única columna vertebral amb dos compartiments: el cap de la columna on es troben les sinapsis i el coll de la columna que uneix el cap de la columna a les dendrites. La teoria prediu que el coll de la columna vertebral molt prim (0,1-0,5 um de diàmetre) aïlla electrònicament el cap de la columna vertebral de la seva dendrita parental, compartimentant així els senyals generats al cap de la columna vertebral. No obstant això, el model detallat amb espines completament distribuïdes sobre dendrites (“model d’espina plena”) és computacionalment molt car. Una solució de compromís comuna és modificar la capacitància i la resistència de la membrana per un Factor Espinosa , instead of modeling all spines explicitly. Here, the Factor d'esquena: té com a objectiu aproximar l'efecte de l'esquena sobre les propietats biofísiques de la membrana cel·lular. . 52 53 F 54 F 54 Inspired by the previous work of Eyal et al. , we investigated how different spatial patterns of excitatory inputs formed on dendritic spines shape neuronal activities in a human pyramidal neuron model with explicitly modeled spines (Fig. ). Noticeably, Eyal et al. employed the spine factor to incorporate spines into dendrites while only a few activated spines were explicitly attached to dendrites (“few-spine model” in Fig. ). The value of spine in their model was computed from the dendritic area and spine area in the reconstructed data. Accordingly, we calculated the spine density from their reconstructed data to make our full-spine model more consistent with Eyal’s few-spine model. With the spine density set to 1.3 μm-1, the pyramidal neuron model contained about 25,000 spines without altering the model’s original morphological and biophysical properties. Further, we repeated the previous experiment protocols with both full-spine and few-spine models. We use the same synaptic input as in Eyal’s work but attach extra background noise to each sample. By comparing the somatic traces (Fig. ) and spike probability (Fig. ) in full-spine and few-spine models, we found that the full-spine model is much leakier than the few-spine model. In addition, the spike probability triggered by the activation of clustered spines appeared to be more nonlinear in the full-spine model (the solid blue line in Fig. ) than in the few-spine model (the dashed blue line in Fig. ). These results indicate that the conventional F-factor method may underestimate the impact of dense spine on the computations of dendritic excitability and nonlinearity. 51 5a F 5a F 5b, c 5d 5d 5d Experiment setup. We examine two major types of models: few-spine models and full-spine models. Few-spine models (two on the left) are the models that incorporated spine area globally into dendrites and only attach individual spines together with activated synapses. In full-spine models (two on the right), all spines are explicitly attached over whole dendrites. We explore the effects of clustered and randomly distributed synaptic inputs on the few-spine models and the full-spine models, respectively. Somatic voltages recorded for cases in . Colors of the voltage curves correspond to , scale bar: 20 ms, 20 mV. Color-coded voltages during the simulation in at specific times. Colors indicate the magnitude of voltage. Somatic spike probability as a function of the number of simultaneously activated synapses (as in Eyal et al.’s work) for four cases in . Background noise is attached. Run time of experiments in with different simulation methods. NEURON: conventional NEURON simulator running on a single CPU core. CoreNEURON: CoreNEURON simulator on a single GPU. DeepDendrite: DeepDendrite on a single GPU. a b a a c b d a e d En la plataforma DeepDendrite, tant els models d'esquena completa com els d'esquena petita van aconseguir una acceleració de 8 vegades en comparació amb CoreNEURON en la plataforma de GPU i una acceleració de 100 vegades en comparació amb NEURON en sèrie en la plataforma de CPU (Fig. ; Supplementary Table ) while keeping the identical simulation results (Supplementary Figs. and Per tant, el mètode DHS permet explorar l'excitabilitat dendrítica en condicions anatòmiques més realistes. 5e 1 4 8 Discussion In this work, we propose the DHS method to parallelize the computation of Hines method and we mathematically demonstrate that the DHS provides an optimal solution without any loss of precision. Next, we implement DHS on the GPU hardware platform and use GPU memory boosting techniques to refine the DHS (Fig. ). When simulating a large number of neurons with complex morphologies, DHS with memory boosting achieves a 15-fold speedup (Supplementary Table ) en comparació amb el mètode de GPU utilitzat en CoreNEURON i fins a 1.500 vegades més de velocitat en comparació amb el mètode de Hines en sèrie en la plataforma CPU (Fig. ; Supplementary Fig. Taula complementària ). Furthermore, we develop the GPU-based DeepDendrite framework by integrating DHS into CoreNEURON. Finally, as a demonstration of the capacity of DeepDendrite, we present a representative application: examine spine computations in a detailed pyramidal neuron model with 25,000 spines. Further in this section, we elaborate on how we have expanded the DeepDendrite framework to enable efficient training of biophysically detailed neural networks. To explore the hypothesis that dendrites improve robustness against adversarial attacks Mostrem que DeepDendrite pot donar suport tant a simulacions de neurociència com a tasques de xarxes neuronals detallades relacionades amb la IA a una velocitat sense precedents, promovent així significativament simulacions de neurociència detallades i potencialment per a futures exploracions de la IA. 55 3 1 4 3 1 56 Desenes d'esforços s'han invertit en accelerar el mètode Hines amb mètodes paral·lels. Els primers treballs se centren principalment en la paral·lelització a nivell de xarxa. En les simulacions de xarxa, cada cèl·lula resol independentment les seves equacions lineals corresponents amb el mètode Hines. , . With network-level methods, we can simulate detailed networks on clusters or supercomputers . In recent years, GPU has been used for detailed network simulation. Because the GPU contains massive computing units, one thread is usually assigned one cell rather than a cell group , , . With further optimization, GPU-based methods achieve much higher efficiency in network simulation. However, the computation inside the cells is still serial in network-level methods, so they still cannot deal with the problem when the “Hines matrix” of each cell scales large. 57 58 59 35 60 61 Cellular-level parallel methods further parallelize the computation inside each cell. The main idea of cellular-level parallel methods is to split each cell into several sub-blocks and parallelize the computation of those sub-blocks , . However, typical cellular-level methods (e.g., the “multi-split” method ) pay less attention to the parallelization strategy. The lack of a fine parallelization strategy results in unsatisfactory performance. To achieve higher efficiency, some studies try to obtain finer-grained parallelization by introducing extra computation operations , , or making approximations on some crucial compartments, while solving linear equations , . These finer-grained parallelization strategies can get higher efficiency but lack sufficient numerical accuracy as in the original Hines method. 27 28 28 29 38 62 63 64 Unlike previous methods, DHS adopts the finest-grained parallelization strategy, i.e., compartment-level parallelization. By modeling the problem of “how to parallelize” as a combinatorial optimization problem, DHS provides an optimal compartment-level parallelization strategy. Moreover, DHS does not introduce any extra operation or value approximation, so it achieves the lowest computational cost and retains sufficient numerical accuracy as in the original Hines method at the same time. Dendritic spines are the most abundant microstructures in the brain for projection neurons in the cortex, hippocampus, cerebellum, and basal ganglia. As spines receive most of the excitatory inputs in the central nervous system, electrical signals generated by spines are the main driving force for large-scale neuronal activities in the forebrain and cerebellum , . The structure of the spine, with an enlarged spine head and a very thin spine neck—leads to surprisingly high input impedance at the spine head, which could be up to 500 MΩ, combining experimental data and the detailed compartment modeling approach , . Due to such high input impedance, a single synaptic input can evoke a “gigantic” EPSP ( ~ 20 mV) at the spine-head level , , thereby boosting NMDA currents and ion channel currents in the spine . However, in the classic single detailed compartment models, all spines are replaced by the coefficient modifying the dendritic cable geometries . This approach may compensate for the leak currents and capacitance currents for spines. Still, it cannot reproduce the high input impedance at the spine head, which may weaken excitatory synaptic inputs, particularly NMDA currents, thereby reducing the nonlinearity in the neuron’s input-output curve. Our modeling results are in line with this interpretation. 10 11 48 65 48 66 11 F 54 On the other hand, the spine’s electrical compartmentalization is always accompanied by the biochemical compartmentalization , , , resulting in a drastic increase of internal [Ca2+], within the spine and a cascade of molecular processes involving synaptic plasticity of importance for learning and memory. Intriguingly, the biochemical process triggered by learning, in turn, remodels the spine’s morphology, enlarging (or shrinking) the spine head, or elongating (or shortening) the spine neck, which significantly alters the spine’s electrical capacity , , , Aquests canvis depenents de l'experiència en la morfologia de la columna vertebral, també coneguts com a "plàsticitat estructural", s'han observat àmpliament en el còrtex visual. , , somatosensory cortex , , motor cortex , hippocampus , and the basal ganglia in vivo. They play a critical role in motor and spatial learning as well as memory formation. However, due to the computational costs, nearly all detailed network models exploit the “F-factor” approach to replace actual spines, and are thus unable to explore the spine functions at the system level. By taking advantage of our framework and the GPU platform, we can run a few thousand detailed neurons models, each with tens of thousands of spines on a single GPU, while maintaining ~100 times faster than the traditional serial method on a single CPU (Fig. ). Therefore, it enables us to explore of structural plasticity in large-scale circuit models across diverse brain regions. 8 52 67 67 68 69 70 71 72 73 74 75 9 76 5e Una altra qüestió crítica és com enllaçar les dendrites a les funcions cerebrals en els sistemes / nivell de xarxa.Ha estat ben establert que les dendrites poden realitzar càlculs complets en les entrades sinàptiques a causa dels canals d'ions enriquits i les propietats de la membrana biofísica local. , , . For example, cortical pyramidal neurons can carry out sublinear synaptic integration at the proximal dendrite but progressively shift to supralinear integration at the distal dendrite . Moreover, distal dendrites can produce regenerative events such as dendritic sodium spikes, calcium spikes, and NMDA spikes/plateau potentials , . Such dendritic events are widely observed in mice or even human cortical neurons in vitro, which may offer various logical operations , or gating functions , . Recently, in vivo recordings in awake or behaving mice provide strong evidence that dendritic spikes/plateau potentials are crucial for orientation selectivity in the visual cortex , sensory-motor integration in the whisker system , , and spatial navigation in the hippocampal CA1 region . 5 6 7 77 6 78 6 79 6 79 80 81 82 83 84 85 To establish the causal link between dendrites and animal (including human) patterns of behavior, large-scale biophysically detailed neural circuit models are a powerful computational tool to realize this mission. However, running a large-scale detailed circuit model of 10,000-100,000 neurons generally requires the computing power of supercomputers. It is even more challenging to optimize such models for in vivo data, as it needs iterative simulations of the models. The DeepDendrite framework can directly support many state-of-the-art large-scale circuit models , , , which were initially developed based on NEURON. Moreover, using our framework, a single GPU card such as Tesla A100 could easily support the operation of detailed circuit models of up to 10,000 neurons, thereby providing carbon-efficient and affordable plans for ordinary labs to develop and optimize their own large-scale detailed models. 86 87 88 Recent works on unraveling the dendritic roles in task-specific learning have achieved remarkable results in two directions, i.e., solving challenging tasks such as image classification dataset ImageNet with simplified dendritic networks , and exploring full learning potentials on more realistic neuron , . However, there lies a trade-off between model size and biological detail, as the increase in network scale is often sacrificed for neuron-level complexity , , . Moreover, more detailed neuron models are less mathematically tractable and computationally expensive . 20 21 22 19 20 89 21 There has also been progress in the role of active dendrites in ANNs for computer vision tasks. Iyer et al. . proposed a novel ANN architecture with active dendrites, demonstrating competitive results in multi-task and continual learning. Jones and Kording used a binary tree to approximate dendrite branching and provided valuable insights into the influence of tree structure on single neurons’ computational capacity. Bird et al. . proposed a dendritic normalization rule based on biophysical behavior, offering an interesting perspective on the contribution of dendritic arbor structure to computation. While these studies offer valuable insights, they primarily rely on abstractions derived from spatially extended neurons, and do not fully exploit the detailed biological properties and spatial information of dendrites. Further investigation is needed to unveil the potential of leveraging more realistic neuron models for understanding the shared mechanisms underlying brain computation and deep learning. 90 91 92 In response to these challenges, we developed DeepDendrite, a tool that uses the Dendritic Hierarchical Scheduling (DHS) method to significantly reduce computational costs and incorporates an I/O module and a learning module to handle large datasets. With DeepDendrite, we successfully implemented a three-layer hybrid neural network, the Human Pyramidal Cell Network (HPC-Net) (Fig. ). This network demonstrated efficient training capabilities in image classification tasks, achieving approximately 25 times speedup compared to training on a traditional CPU-based platform (Fig. ; Supplementary Table ). 6a, b 6f 1 The illustration of the Human Pyramidal Cell Network (HPC-Net) for image classification. Images are transformed to spike trains and fed into the network model. Learning is triggered by error signals propagated from soma to dendrites. Training with mini-batch. Multiple networks are simulated simultaneously with different images as inputs. The total weight updates ΔW are computed as the average of ΔWi from each network. Comparison of the HPC-Net before and after training. Left, the visualization of hidden neuron responses to a specific input before (top) and after (bottom) training. Right, hidden layer weights (from input to hidden layer) distribution before (top) and after (bottom) training. Workflow of the transfer adversarial attack experiment. We first generate adversarial samples of the test set on a 20-layer ResNet. Then use these adversarial samples (noisy images) to test the classification accuracy of models trained with clean images. Prediction accuracy of each model on adversarial samples after training 30 epochs on MNIST (left) and Fashion-MNIST (right) datasets. Run time of training and testing for the HPC-Net. The batch size is set to 16. Left, run time of training one epoch. Right, run time of testing. Parallel NEURON + Python: training and testing on a single CPU with multiple cores, using 40-process-parallel NEURON to simulate the HPC-Net and extra Python code to support mini-batch training. DeepDendrite: training and testing the HPC-Net on a single GPU with DeepDendrite. a b c d e f Additionally, it is widely recognized that the performance of Artificial Neural Networks (ANNs) can be undermined by adversarial attacks —intentionally engineered perturbations devised to mislead ANNs. Intriguingly, an existing hypothesis suggests that dendrites and synapses may innately defend against such attacks . Our experimental results utilizing HPC-Net lend support to this hypothesis, as we observed that networks endowed with detailed dendritic structures demonstrated some increased resilience to transfer adversarial attacks compared to standard ANNs, as evident in MNIST and Fashion-MNIST datasets (Fig. ). This evidence implies that the inherent biophysical properties of dendrites could be pivotal in augmenting the robustness of ANNs against adversarial interference. Nonetheless, it is essential to conduct further studies to validate these findings using more challenging datasets such as ImageNet . 93 56 94 95 96 6d, e 97 In conclusion, DeepDendrite has shown remarkable potential in image classification tasks, opening up a world of exciting future directions and possibilities. To further advance DeepDendrite and the application of biologically detailed dendritic models in AI tasks, we may focus on developing multi-GPU systems and exploring applications in other domains, such as Natural Language Processing (NLP), where dendritic filtering properties align well with the inherently noisy and ambiguous nature of human language. Challenges include testing scalability in larger-scale problems, understanding performance across various tasks and domains, and addressing the computational complexity introduced by novel biological principles, such as active dendrites. By overcoming these limitations, we can further advance the understanding and capabilities of biophysically detailed dendritic neural networks, potentially uncovering new advantages, enhancing their robustness against adversarial attacks and noisy inputs, and ultimately bridging the gap between neuroscience and modern AI. Methods Simulation with DHS CoreNEURON simulator ( ) uses the NEURON architecture and is optimized for both memory usage and computational speed. We implement our Dendritic Hierarchical Scheduling (DHS) method in the CoreNEURON environment by modifying its source code. All models that can be simulated on GPU with CoreNEURON can also be simulated with DHS by executing the following command: 35 https://github.com/BlueBrain/CoreNeuron 25 coreneuron_exec -d /path/to/models -e time --cell-permute 3 --cell-nthread 16 --gpu The usage options are as in Table . 1 Accuracy of the simulation using cellular-level parallel computation To ensure the accuracy of the simulation, we first need to define the correctness of a cellular-level parallel algorithm to judge whether it will generate identical solutions compared with the proven correct serial methods, like the Hines method used in the NEURON simulation platform. Based on the theories in parallel computing , a parallel algorithm will yield an identical result as its corresponding serial algorithm, if and only if the data process order in the parallel algorithm is consistent with data dependency in the serial method. The Hines method has two symmetrical phases: triangularization and back-substitution. By analyzing the serial computing Hines method , we find that its data dependency can be formulated as a tree structure, where the nodes on the tree represent the compartments of the detailed neuron model. In the triangularization process, the value of each node depends on its children nodes. In contrast, during the back-substitution process, the value of each node is dependent on its parent node (Fig. ). Thus, we can compute nodes on different branches in parallel as their values are not dependent. 34 55 1d Based on the data dependency of the serial computing Hines method, we propose three conditions to make sure a parallel method will yield identical solutions as the serial computing Hines method: (1) The tree morphology and initial values of all nodes are identical to those in the serial computing Hines method; (2) In the triangularization phase, a node can be processed if and only if all its children nodes are already processed; (3) In the back-substitution phase, a node can be processed only if its parent node is already processed. Once a parallel computing method satisfies these three conditions, it will produce identical solutions as the serial computing method. Computational cost of cellular-level parallel computing method To theoretically evaluate the run time, i.e., efficiency, of the serial and parallel computing methods, we introduce and formulate the concept of computational cost as follows: given a tree and threads (basic computational units) to perform triangularization, parallel triangularization equals to divide the node set of into subsets, i.e., = { , , … } where the size of each subset | | ≤ , i.e., at most nodes can be processed each step since there are only threads. The process of the triangularization phase follows the order: → → … → , and nodes in the same subset can be processed in parallel. So, we define | | (the size of set , i.e., here) as the computational cost of the parallel computing method. In short, we define the computational cost of a parallel method as the number of steps it takes in the triangularization phase. Because the back-substitution is symmetrical with triangularization, the total cost of the entire solving equation phase is twice that of the triangularization phase. T k V T n V V1 V2 Vn Vi k k k V1 V2 Vn Vi V V n Mathematical scheduling problem Based on the simulation accuracy and computational cost, we formulate the parallelization problem as a mathematical scheduling problem: Given a tree = { , } i un integer positiu , where is the node-set and is the edge set. Define partition ( ) = { , , … }, | | ≤ , 1 ≤ ≤ n, where | | indicates the cardinal number of subset , i.e., the number of nodes in , and for each node ∈ , all its children nodes { | ∈children( )} must in a previous subset , where 1 ≤ < . Our goal is to find an optimal partition ( ) whose computational cost | ( )| is minimal. T V E k V E P V V1 V2 Vn Vi k i Vi Vi Vi v Vi c c v Vj j i P* V P* V Here subset consists of all nodes that will be computed at -th step (Fig. ), so | | ≤ indicates that we can compute nodes each step at most because the number of available threads is . The restriction “for each node ∈ , all its children nodes { El Barça ∈children( )} must in a previous subset , where 1 ≤ < Indica que el node can be processed only if all its child nodes are processed. Vi i 2e Vi k k k v Vi c c v Vj j i v DHS implementation We aim to find an optimal way to parallelize the computation of solving linear equations for each neuron model by solving the mathematical scheduling problem above. To get the optimal partition, DHS first analyzes the topology and calculates the depth ( ) for all nodes ∈ . Then, the following two steps will be executed iteratively until every node ∈ is assigned to a subset: (1) find all candidate nodes and put these nodes into candidate set . A node is a candidate only if all its child nodes have been processed or it does not have any child nodes. (2) if | | ≤ , i.e., the number of candidate nodes is smaller or equivalent to the number of available threads, remove all nodes in and put them into , otherwise, remove deepest nodes from and add them to subset . Label these nodes as processed nodes (Fig. ). After filling in subset , go to step (1) to fill in the next subset . d v v V v V Q Q k Q V*i k Q Vi 2d Vi Vi+1 Correctness proof for DHS After applying DHS to a neural tree = { , }, we get a partition ( ) = { , , ... }, | | ≤ , 1 ≤ ≤ . Nodes in the same subset will be computed in parallel, taking steps to perform triangularization and back-substitution, respectively. We then demonstrate that the reordering of the computation in DHS will result in a result identical to the serial Hines method. T V E P V V1 V2 Vn Vi k i n Vi n The partition ( ) obtained from DHS decides the computation order of all nodes in a neural tree. Below we demonstrate that the computation order determined by ( ) satisfies the correctness conditions. ( ) is obtained from the given neural tree . Operations in DHS do not modify the tree topology and values of tree nodes (corresponding values in the linear equations), so the tree morphology and initial values of all nodes are not changed, which satisfies condition 1: the tree morphology and initial values of all nodes are identical to those in serial Hines method. In triangularization, nodes are processed from subset to . As shown in the implementation of DHS, all nodes in subset are selected from the candidate set , and a node can be put into only if all its child nodes have been processed. Thus the child nodes of all nodes in are in { , , ... }, meaning that a node is only computed after all its children have been processed, which satisfies condition 2: in triangularization, a node can be processed if and only if all its child nodes are already processed. In back-substitution, the computation order is the opposite of that in triangularization, i.e., from to . As shown before, the child nodes of all nodes in are in { , , … }, so parent nodes of nodes in are in { , , … }, which satisfies condition 3: in back-substitution, a node can be processed only if its parent node is already processed. P V P V P V T V1 Vn Vi Q Q Vi El V1 V2 Vi-1 Vn V1 Vi V1 V2 Vi-1 Vi Vi+1 Vi+2 Vn Optimality proof for DHS The idea of the proof is that if there is another optimal solution, it can be transformed into our DHS solution without increasing the number of steps the algorithm requires, thus indicating that the DHS solution is optimal. For each subset in ( ), DHS moves (thread number) deepest nodes from the corresponding candidate set to . If the number of nodes in is smaller than , move all nodes from Dues . To simplify, we introduce , indicating the depth sum of deepest nodes in . All subsets in ( ) satisfy the max-depth criteria (Supplementary Fig. ): . We then prove that selecting the deepest nodes in each iteration makes an optimal partition. If there exists an optimal partition = { , , … } containing subsets that do not satisfy the max-depth criteria, we can modify the subsets in ( ) so that all subsets consist of the deepest nodes from and the number of subsets ( | ( )|) remain the same after modification. Vi P V k Qi Vi Qi k Qi Vi Di k Qi P V 6a P(V) P*(V) V*1 V*2 V*s P* V Q P* V Without any loss of generalization, we start from the first subset not satisfying the criteria, i.e., . There are two possible cases that will make not satisfy the max-depth criteria: (1) | | < and there exist some valid nodes in that are not put to ; (2) | | = but nodes in are not the deepest nodes in . V*i V*i V * I k Qi V*i V*i k V*i k Qi For case (1), because some candidate nodes are not put to , these nodes must be in the subsequent subsets. As | | , we can move the corresponding nodes from the subsequent subsets to , which will not increase the number of subsets and make satisfy the criteria (Supplementary Fig. , top). For case (2), | Càritas = , these deeper nodes that are not moved from the candidate set into must be added to subsequent subsets (Supplementary Fig. , bottom). These deeper nodes can be moved from subsequent subsets to through the following method. Assume that after filling , is picked and one of the -th deepest nodes is still in Així doncs will be put into a subsequent subset ( > ). We first move from to + , then modify subset + as follows: if | + | ≤ and none of the nodes in + is the parent of node , stop modifying the latter subsets. Otherwise, modify + as follows (Supplementary Fig. ): if the parent node of Està en + , move this parent node to + ; else move the node with minimum depth from + to + més . After adjusting Modificació de subseccions posteriors + , + , … with the same strategy. Finally, move from to . V*i V*i < k V*i V*i 6b V*i k Qi V*i 6b V*i V*i v k v’ Qi v’ V*j j i v V*i V*i 1 V*i 1 V*i 1 k V*i 1 v V*i 1 6c v V*i 1 V*i 2 V*i 1 V*i 2 V*i V*i 1 V*i 2 V*j-1 v’ V*j V*i With the modification strategy described above, we can replace all shallower nodes in with the -th deepest node in and keep the number of subsets, i.e., | ( )| the same after modification. We can modify the nodes with the same strategy for all subsets in ( ) that do not contain the deepest nodes. Finally, all subsets ∈ ( ) can satisfy the max-depth criteria, and | ( )| does not change after modifying. V*i k Qi P* V P* V V*i P* V P* V En conclusió, el DHS genera una partició ( ), i tots els subsets ∈ ( ) satisfy the max-depth condition: . For any other optimal partition ( ) we can modify its subsets to make its structure the same as ( ), i.e., each subset consists of the deepest nodes in the candidate set, and keep | ( ) the same after modification. So, the partition ( ) obtained from DHS is one of the optimal partitions. P V Vi P V P * V P V P* V | P V Implementació de GPU i millora de la memòria To achieve high memory throughput, GPU utilizes the memory hierarchy of (1) global memory, (2) cache, (3) register, where global memory has large capacity but low throughput, while registers have low capacity but high throughput. We aim to boost memory throughput by leveraging the memory hierarchy of GPU. La GPU utilitza l'arquitectura SIMT (Single-Instruction, Multiple-Thread). Warps són les unitats de programació bàsiques de la GPU (una warp és un grup de 32 fils paral·lels). Una warp executa la mateixa instrucció amb dades diferents per a diferents fils . Correctly ordering the nodes is essential for this batching of computation in warps, to make sure DHS obtains identical results as the serial Hines method. When implementing DHS on GPU, we first group all cells into multiple warps based on their morphologies. Cells with similar morphologies are grouped in the same warp. We then apply DHS on all neurons, assigning the compartments of each neuron to multiple threads. Because neurons are grouped into warps, the threads for the same neuron are in the same warp. Therefore, the intrinsic synchronization in warps keeps the computation order consistent with the data dependency of the serial Hines method. Finally, threads in each warp are aligned and rearranged according to the number of compartments. 46 When a warp loads pre-aligned and successively-stored data from global memory, it can make full use of the cache, which leads to high memory throughput, while accessing scatter-stored data would reduce memory throughput. After compartments assignment and threads rearrangement, we permute data in global memory to make it consistent with computing orders so that warps can load successively-stored data when executing the program. Moreover, we put those necessary temporary variables into registers rather than global memory. Registers have the highest memory throughput, so the use of registers further accelerates DHS. Full-spine and few-spine biophysical models We used the published human pyramidal neuron . The membrane capacitance m = 0.44 μF cm-2, membrane resistance m = 48,300 Ω cm2, and axial resistivity a = 261.97 Ω cm. In this model, all dendrites were modeled as passive cables while somas were active. The leak reversal potential l = -83.1 mV. Ion channels such as Na+ and K+ were inserted on soma and initial axon, and their reversal potentials were Na = 67.6 mV, K = -102 mV respectively. All these specific parameters were set the same as in the model of Eyal, et al. , for more details please refer to the published model (ModelDB, access No. 238347). 51 c r r E E E 51 In the few-spine model, the membrane capacitance and maximum leak conductance of the dendritic cables 60 μm away from soma were multiplied by a spine factor to approximate dendritic spines. In this model, spine was set to 1.9. Only the spines that receive synaptic inputs were explicitly attached to dendrites. F F In the full-spine model, all spines were explicitly attached to dendrites. We calculated the spine density with the reconstructed neuron in Eyal, et al. . The spine density was set to 1.3 μm-1, and each cell contained 24994 spines on dendrites 60 μm away from the soma. 51 The morphologies and biophysical mechanisms of spines were the same in few-spine and full-spine models. The length of the spine neck neck = 1.35 μm and the diameter neck = 0.25 μm, whereas the length and diameter of the spine head were 0.944 μm, i.e., the spine head area was set to 2.8 μm2. Both spine neck and spine head were modeled as passive cables, with the reversal potential = -86 mV. The specific membrane capacitance, membrane resistance, and axial resistivity were the same as those for dendrites. L D El Synaptic inputs We investigated neuronal excitability for both distributed and clustered synaptic inputs. All activated synapses were attached to the terminal of the spine head. For distributed inputs, all activated synapses were randomly distributed on all dendrites. For clustered inputs, each cluster consisted of 20 activated synapses that were uniformly distributed on a single randomly-selected compartment. All synapses were activated simultaneously during the simulation. AMPA-based and NMDA-based synaptic currents were simulated as in Eyal et al.’s work. AMPA conductance was modeled as a double-exponential function and NMDA conduction as a voltage-dependent double-exponential function. For the AMPA model, the specific rise and La decadència es va establir en 0,3 i 1,8 ms. Per al model NMDA, rise and decay were set to 8.019 and 34.9884 ms, respectively. The maximum conductance of AMPA and NMDA were 0.73 nS and 1.31 nS. τ τ τ τ Background noise We attached background noise to each cell to simulate a more realistic environment. Noise patterns were implemented as Poisson spike trains with a constant rate of 1.0 Hz. Each pattern started at start = 10 ms and lasted until the end of the simulation. We generated 400 noise spike trains for each cell and attached them to randomly-selected synapses. The model and specific parameters of synaptic currents were the same as described in , except that the maximum conductance of NMDA was uniformly distributed from 1.57 to 3.275, resulting in a higher AMPA to NMDA ratio. t Synaptic Inputs Exploring neuronal excitability We investigated the spike probability when multiple synapses were activated simultaneously. For distributed inputs, we tested 14 cases, from 0 to 240 activated synapses. For clustered inputs, we tested 9 cases in total, activating from 0 to 12 clusters respectively. Each cluster consisted of 20 synapses. For each case in both distributed and clustered inputs, we calculated the spike probability with 50 random samples. Spike probability was defined as the ratio of the number of neurons fired to the total number of samples. All 1150 samples were simulated simultaneously on our DeepDendrite platform, reducing the simulation time from days to minutes. Performing AI tasks with the DeepDendrite platform Conventional detailed neuron simulators lack two functionalities important to modern AI tasks: (1) alternately performing simulations and weight updates without heavy reinitialization and (2) simultaneously processing multiple stimuli samples in a batch-like manner. Here we present the DeepDendrite platform, which supports both biophysical simulating and performing deep learning tasks with detailed dendritic models. DeepDendrite consta de tres mòduls (Fig. complementari. ): (1) an I/O module; (2) a DHS-based simulating module; (3) a learning module. When training a biophysically detailed model to perform learning tasks, users first define the learning rule, then feed all training samples to the detailed model for learning. In each step during training, the I/O module picks a specific stimulus and its corresponding teacher signal (if necessary) from all training samples and attaches the stimulus to the network model. Then, the DHS-based simulating module initializes the model and starts the simulation. After simulation, the learning module updates all synaptic weights according to the difference between model responses and teacher signals. After training, the learned model can achieve performance comparable to ANN. The testing phase is similar to training, except that all synaptic weights are fixed. 5 HPC-Net model La classificació d'imatges és una tasca típica en el camp de la IA. En aquesta tasca, un model hauria d'aprendre a reconèixer el contingut d'una imatge donada i produir l'etiqueta corresponent.Aquí presentem la HPC-Net, una xarxa formada per models de neurones piramidals humanes detallats que poden aprendre a realitzar tasques de classificació d'imatges utilitzant la plataforma DeepDendrite. HPC-Net has three layers, i.e., an input layer, a hidden layer, and an output layer. The neurons in the input layer receive spike trains converted from images as their input. Hidden layer neurons receive the output of input layer neurons and deliver responses to neurons in the output layer. The responses of the output layer neurons are taken as the final output of HPC-Net. Neurons between adjacent layers are fully connected. For each image stimulus, we first convert each normalized pixel to a homogeneous spike train. For pixel with coordinates ( ) in the image, the corresponding spike train has a constant interspike interval ISI( ) (in ms) which is determined by the pixel value ( ) as shown in Eq. ( ). x, y τ x, y p x, y 1 In our experiment, the simulation for each stimulus lasted 50 ms. All spike trains started at 9 + ISI ms and lasted until the end of the simulation. Then we attached all spike trains to the input layer neurons in a one-to-one manner. The synaptic current triggered by the spike arriving at time Es dóna per τ t0 where és la tensió postsinàptica, el potencial de reversió syn = 1 mV, the maximum synaptic conductance max = 0.05 μS, and the time constant = 0.5 ms. v E g τ Neurons in the input layer were modeled with a passive single-compartment model. The specific parameters were set as follows: membrane capacitance m = 1.0 μF cm-2, membrane resistance m = 104 Ω cm2, axial resistivity a = 100 Ω cm, reversal potential of passive compartment l = 0 mV. c r r E La capa oculta conté un grup de models de neurones piramidals humans, que reben les tensions somàtiques de les neurones de la capa d'entrada. , and all neurons were modeled with passive cables. The specific membrane capacitance m = 1.5 μF cm-2, membrane resistance m = 48,300 Ω cm2, axial resistivity a = 261,97 Ω cm, i el potencial d'inversió de tots els cables passius l = 0 mV. Input neurons could make multiple connections to randomly-selected locations on the dendrites of hidden neurons. The synaptic current activated by the -th synapse of the -th input neuron on neuron ’s dendrite is defined as in Eq. ( ), where is the synaptic conductance, is the synaptic weight, is the ReLU-like somatic activation function, and is the somatic voltage of the -th input neuron at time . 51 c r r E k i j 4 gijk Wijk i t Neurons in the output layer were also modeled with a passive single-compartment model, and each hidden neuron only made one synaptic connection to each output neuron. All specific parameters were set the same as those of the input neurons. Synaptic currents activated by hidden neurons are also in the form of Eq. ( ). 4 Image classification with HPC-Net For each input image stimulus, we first normalized all pixel values to 0.0-1.0. Then we converted normalized pixels to spike trains and attached them to input neurons. Somatic voltages of the output neurons are used to compute the predicted probability of each class, as shown in equation , where is the probability of -th class predicted by the HPC-Net, is the average somatic voltage from 20 ms to 50 ms of the -th output neuron, and indicates the number of classes, which equals the number of output neurons. The class with the maximum predicted probability is the final classification result. In this paper, we built the HPC-Net with 784 input neurons, 64 hidden neurons, and 10 output neurons. 6 pi i i C Synaptic plasticity rules for HPC-Net Inspired by previous work , we use a gradient-based learning rule to train our HPC-Net to perform the image classification task. The loss function we use here is cross-entropy, given in Eq. ( ), where is the predicted probability for class , indicates the actual class the stimulus image belongs to, = 1 si la imatge d'entrada pertany a la classe , and = 0 if not. 36 7 pi i yi yi i yi When training HPC-Net, we compute the update for weight (El pes sinàptic de la -th synapse connecting neuron to neuron ) at each time step. After the simulation of each image stimulus, is updated as shown in Eq. ( ): Wijk k i j Wijk 8 Here is the learning rate, is the update value at time , , són tensions somàtiques de les neurones and respectively, is the -th synaptic current activated by neuron on neuron , la seva conducció sinàptica, is the transfer resistance between the -th connected compartment of neuron on neuron Dendrita a la neurona ’s soma, s = 30 ms e = 50 ms are start time and end time for learning respectively. For output neurons, the error term can be computed as shown in Eq. ( Per a les neurones ocultes, el terme d'error es calcula a partir dels termes d'error de la capa de sortida, donada en Eq. ( ). t vj vi i j Iijk k i j gijk rijk k i j j t t 10 11 Atès que totes les neurones de sortida són d'un sol compartiment, igual a la resistència d'entrada del compartiment corresponent, les resistències de transferència i d'entrada són calculades per NEURON. Mini-batch training is a typical method in deep learning for achieving higher prediction accuracy and accelerating convergence. DeepDendrite also supports mini-batch training. When training HPC-Net with mini-batch size Batxillerat, fent copies de lot de HPC-Net. Durant l'entrenament, cada còpia s'alimenta amb una mostra d'entrenament diferent del lot. DeepDendrite calcula primer l'actualització de pes per a cada còpia per separat. Després de fer totes les còpies en el lot d'entrenament actual, es calcula l'actualització de pes mitjà i els pesos en totes les còpies s'actualitzen per aquesta mateixa quantitat. N N Robustness against adversarial attack with HPC-Net To demonstrate the robustness of HPC-Net, we tested its prediction accuracy on adversarial samples and compared it with an analogous ANN (one with the same 784-64-10 structure and ReLU activation, for fair comparison in our HPC-Net each input neuron only made one synaptic connection to each hidden neuron). We first trained HPC-Net and ANN with the original training set (original clean images). Then we added adversarial noise to the test set and measured their prediction accuracy on the noisy test set. We used the Foolbox , to generate adversarial noise with the FGSM method ANN va ser entrenat amb PyTorch , and HPC-Net was trained with our DeepDendrite. For fairness, we generated adversarial noise on a significantly different network model, a 20-layer ResNet . The noise level ranged from 0.02 to 0.2. We experimented on two typical datasets, MNIST and Fashion-MNIST . Results show that the prediction accuracy of HPC-Net is 19% and 16.72% higher than that of the analogous ANN, respectively. 98 99 93 100 101 95 96 Reporting summary Further information on research design is available in the linked to this article. Nature Portfolio Reporting Summary Data availability Les dades que donen suport als resultats d'aquest estudi estan disponibles en el document, els fitxers d'informació complementària i les dades de font proporcionats amb aquest document. – are available at El conjunt de dades MNIST està disponible públicament a El conjunt de dades Fashion-MNIST està disponible públicament a . are provided with this paper. 3 6 https://github.com/pkuzyc/DeepDendrite http://yann.lecun.com/exdb/mnist https://github.com/zalandoresearch/fashion-mnist Source data Code availability The source code of DeepDendrite as well as the models and code used to reproduce Figs. – in this study are available at . 3 6 https://github.com/pkuzyc/DeepDendrite References McCulloch, W. S. i Pitts, W. Un càlcul lògic de les idees immanents en l'activitat nerviosa. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. , 436–444 (2015). Nature 521 Poirazi, P., Brannon, T. i Mel, B. W. Aritmètica de la suma sinàptica subterrani en una cèl·lula piramidal model CA1. London, M. & Häusser, M. Dendritic computation. , 503–532 (2005). Annu. Rev. Neurosci. 28 Branco, T. & Häusser, M. The single dendritic branch as a fundamental functional unit in the nervous system. , 494–502 (2010). Curr. Opin. Neurobiol. 20 Stuart, G. J. & Spruston, N. Dendritic integration: 60 years of progress. , 1713–1721 (2015). Nat. Neurosci. 18 Poirazi, P. & Papoutsi, A. Illuminating dendritic function with computational models. , 303–321 (2020). Nat. Rev. Neurosci. 21 Yuste, R. & Denk, W. Espines dendrítiques com a unitats funcionals bàsiques de la integració neuronal.Nature 375, 682–684 (1995). Engert, F. i Bonhoeffer, T. Canvis dendrítics de la columna vertebral associats amb la plasticitat sinàptica a llarg termini de l'hipocamp. Yuste, R. Dendritic spines and distributed circuits. , 772–781 (2011). Neuron 71 Yuste, R. Electrical compartmentalization in dendritic spines. , 429–449 (2013). Annu. Rev. Neurosci. 36 Rall, W. Branching dendritic trees and motoneuron membrane resistivity. , 491–527 (1959). Exp. Neurol. 1 Segev, I. & Rall, W. Computational study of an excitable dendritic spine. , 499–523 (1988). J. Neurophysiol. 60 Silver, D. et al. Mastering the game of go with deep neural networks and tree search. , 484–489 (2016). Nature 529 Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. , 1140–1144 (2018). Science 362 McCloskey, M. & Cohen, N. J. Catastrophic interference in connectionist networks: the sequential learning problem. , 109–165 (1989). Psychol. Learn. Motiv. 24 French, R. M. Catastrophic forgetting in connectionist networks. , 128–135 (1999). Trends Cogn. Sci. 3 Naud, R. & Sprekeler, H. Sparse bursts optimize information transmission in a multiplexed neural code. , E6329–E6338 (2018). Proc. Natl Acad. Sci. USA 115 Sacramento, J., Costa, R. P., Bengio, Y. & Senn, W. Microcircuits corticals dendrítics aproximen l'algorisme de propagació posterior. en Avanços en sistemes de processament d'informació neuronal 31 (NeurIPS 2018) (NeurIPS*,* 2018). Payeur, A., Guerguiev, J., Zenke, F., Richards, B. A. & Naud, R. La plasticitat sinàptica dependent de Burst pot coordinar l'aprenentatge en circuits jeràrquics. Bicknell, B. A. & Häusser, M. A synaptic learning rule for exploiting nonlinear dendritic computation. , 4001–4017 (2021). Neuron 109 Moldwin, T., Kalmenson, M. & Segev, I. The gradient clusteron: a model neuron that learns to solve classification tasks via dendritic nonlinearities, structural plasticity, and gradient descent. , e1009015 (2021). PLoS Comput. Biol. 17 Hodgkin, A. L. i Huxley, A. F. Una descripció quantitativa del corrent de membrana i la seva aplicació a la conducció i l'excitació en el nervi. Rall, W. Teoria de les propietats fisiològiques de les dendrites. Ann. N. Y. Acad. Sci. 96, 1071-1092 (1962). Hines, M. L. & Carnevale, N. T. The NEURON simulation environment. , 1179–1209 (1997). Neural Comput. 9 Bower, J. M. & Beeman, D. in (eds Bower, J.M. & Beeman, D.) 17–27 (Springer New York, 1998). The Book of GENESIS: Exploring Realistic Neural Models with the GEneral NEural SImulation System Hines, M. L., Eichner, H. & Schürmann, F. Neuron splitting in compute-bound parallel network simulations enables runtime scaling with twice as many processors. , 203–210 (2008). J. Comput. Neurosci. 25 Hines, M. L., Markram, H. & Schürmann, F. Fully implicit parallel simulation of single neurons. , 439–448 (2008). J. Comput. Neurosci. 25 Ben-Shalom, R., Liberman, G. i Korngreen, A. Acceleració del modelatge compartimental en una unitat de processament gràfic. Tsuyuki, T., Yamamoto, Y. & Yamazaki, T. Efficient numerical simulation of neuron models with spatial structure on graphics processing units. In (eds Hirose894Akiraet al.) 279–285 (Springer International Publishing, 2016). Proc. 2016 International Conference on Neural Information Processing Vooturi, D. T., Kothapalli, K. & Bhalla, U. S. Parallelizing Hines Matrix Solver in Neuron Simulations on GPU. In 388–397 (IEEE, 2017). Proc. IEEE 24th International Conference on High Performance Computing (HiPC) Huber, F. Efficient tree solver for hines matrices on the GPU. Preprint at (2018). https://arxiv.org/abs/1810.12742 Korte, B. i Vygen, J. Teoria de l'optimització combinatòria i algoritmes 6 edn (Springer, 2018). Gebali, F. (Wiley, 2011). Algorithms and Parallel Computing Kumbhar, P. et al. CoreNEURON: An optimized compute engine for the NEURON simulator. , 63 (2019). Front. Neuroinform. 13 Urbanczik, R. & Senn, W. Learning by the dendritic prediction of somatic spiking. , 521–528 (2014). Neuron 81 Ben-Shalom, R., Aviv, A., Razon, B. & Korngreen, A. Optimizing ion channel models using a parallel genetic algorithm on graphical processors. , 183–194 (2012). J. Neurosci. Methods 206 Mascagni, M. Algoritme de paral·lelització per a solucions computacionals a models de neurones de cable arbitràriament ramificats.J. Neurosci. mètodes 36, 105-114 (1991). McDougal, R. A. et al. Vint anys de modelDB i més enllà: la construcció d'eines de modelatge essencials per al futur de la neurociència. Migliore, M., Messineo, L. & Ferrante, M. Dendritic Ih selectively blocks temporal summation of unsynchronized distal inputs in CA1 pyramidal neurons. , 5–13 (2004). J. Comput. Neurosci. 16 Hemond, P. et al. Distinct classes of pyramidal cells exhibit mutually exclusive firing patterns in hippocampal area CA3b. , 411–424 (2008). Hippocampus 18 Hay, E., Hill, S., Schürmann, F., Markram, H. & Segev, I. Models of neocortical layer 5b pyramidal cells capturing a wide range of dendritic and perisomatic active Properties. , e1002107 (2011). PLoS Comput. Biol. 7 Masoli, S., Solinas, S. & D’Angelo, E. El processament del potencial d’acció en un model de cèl·lula purkinje detallat revela un paper crític per a la compartimentalització axonal. Lindroos, R. et al. Basal ganglia neuromodulation over multiple temporal and structural scales—simulations of direct pathway MSNs investigate the fast onset of dopaminergic effects and predict the role of Kv4.2. , 3 (2018). Front. Neural Circuits 12 Migliore, M. et al. Synaptic clusters function as odor operators in the olfactory bulb. , 8499–8504 (2015). Proc. Natl Acad. Sci. USa 112 NVIDIA. Guia de programació C++ CUDA. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (2021). NVIDIA. CUDA C++ Guia de les millors pràctiques. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html (2021). Harnett, M. T., Makara, J. K., Spruston, N., Kath, W. L. & Magee, J. C. Synaptic amplification by dendritic spines enhances input cooperativity. , 599–602 (2012). Nature 491 Chiu, C. Q. et al. Compartmentalització de la inhibició GABAèrgica per espines dendrítiques. ciència 340, 759–762 (2013). Tønnesen, J., Katona, G., Rózsa, B. & Nägerl, U. V. Spine neck plasticity regulates compartmentalization of synapses. , 678–685 (2014). Nat. Neurosci. 17 Eyal, G. et al. Human cortical pyramidal neurons: from spines to spikes via models. , 181 (2018). Front. Cell. Neurosci. 12 Koch, C. & Zador, A. The function of dendritic spines: devices subserving biochemical rather than electrical compartmentalization. , 413–422 (1993). J. Neurosci. 13 Koch, C. Dendritic spines. In (Oxford University Press, 1999). Biophysics of Computation Rapp, M., Yarom, Y. & Segev, I. The impact of parallel fiber background activity on the cable properties of cerebellar purkinje cells. , 518–533 (1992). Neural Comput. 4 Hines, M. Efficient computation of branched nerve equations. , 69–76 (1984). Int. J. Bio-Med. Comput. 15 Nayebi, A. & Ganguli, S. Biologically inspired protection of deep networks from adversarial attacks. Preprint at (2017). https://arxiv.org/abs/1703.09202 Goddard, N. H. & Hood, G. Large-Scale Simulation Using Parallel GENESIS. In (eds Bower James M. & Beeman David) 349-379 (Springer New York, 1998). The Book of GENESIS: Exploring Realistic Neural Models with the GEneral NEural SImulation System Migliore, M., Cannia, C., Lytton, W. W., Markram, H. & Hines, M. L. Parallel network simulations with NEURON. , 119 (2006). J. Comput. Neurosci. 21 Lytton, W. W. et al. Simulation neurotechnologies for advancing brain research: parallelizing large networks in NEURON. , 2063–2090 (2016). Neural Comput. 28 Valero-Lara, P. et al. cuHinesBatch: Solving multiple Hines systems on GPUs human brain project. In 566–575 (IEEE, 2017). Proc. 2017 International Conference on Computational Science Akar, N. A. et al. Arbor—A morphologically-detailed neural network simulation library for contemporary high-performance computing architectures. In 274–282 (IEEE, 2019). Proc. 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) Ben-Shalom, R. et al. NeuroGPU: Accelerating multi-compartment, biophysically detailed neuron simulations on GPUs. , 109400 (2022). J. Neurosci. Methods 366 Rempe, M. J. & Chopp, D. L. Algoritme predictor-corrector per a equacions de reacció-difusió associades amb l'activitat neural en estructures ramificades. SIAM J. Sci. Comput. 28, 2139-2161 (2006). Kozloski, J. & Wagner, J. An ultrascalable solution to large-scale neural tissue simulation. , 15 (2011). Front. Neuroinform. 5 Jayant, K. et al. Enregistraments de tensió intracel·lular dirigits de les espines dendrítiques utilitzant nanopipetes recobertes amb punts quàntics. Nat. Nanotechnol. 12, 335–342 (2017). Palmer, L. M. & Stuart, G. J. Membrane potential changes in dendritic spines during action potentials and synaptic input. , 6897–6903 (2009). J. Neurosci. 29 Nishiyama, J. & Yasuda, R. Biochemical computation for spine structural plasticity. , 63–75 (2015). Neuron 87 Yuste, R. i Bonhoeffer, T. Canvis morfològics en les espines dendrítiques associats amb la plasticitat sinàptica a llarg termini. Holtmaat, A. i Svoboda, K. Plasticitat sinàptica estructural dependent de l'experiència en el cervell de mamífers. Caroni, P., Donato, F. & Muller, D. Structural plasticity upon learning: regulation and functions. , 478–490 (2012). Nat. Rev. Neurosci. 13 Keck, T. et al. Massive restructuring of neuronal circuits during functional reorganization of adult visual cortex. , 1162 (2008). Nat. Neurosci. 11 Hofer, S. B., Mrsic-Flogel, T. D., Bonhoeffer, T. & Hübener, M. Experience leaves a lasting structural trace in cortical circuits. , 313–317 (2009). Nature 457 Trachtenberg, J. T. et al. Long-term in vivo imaging of experience-dependent synaptic plasticity in adult cortex. , 788–794 (2002). Nature 420 Marik, S. A., Yamahachi, H., McManus, J. N., Szabo, G. & Gilbert, C. D. Axonal dynamics of excitatory and inhibitory neurons in somatosensory cortex. , e1000395 (2010). PLoS Biol. 8 Xu, T. et al. Rapid formation and selective stabilization of synapses for enduring motor memories. , 915–919 (2009). Nature 462 Albarran, E., Raissi, A., Jáidar, O., Shatz, C. J. & Ding, J. B. Enhancing motor learning by increasing the stability of newly formed dendritic spines in the motor cortex. , 3298–3311 (2021). Neuron 109 Branco, T. & Häusser, M. Gradients d'integració sinàptica en dendrits de cèl·lules piramidals singulars corticals. Neuron 69, 885-892 (2011). Major, G., Larkum, M. E. i Schiller, J. Propietats actives de les dendrites neuronals piramidals neocòrtiques. Annu. Rev. Neurosci. 36, 1–24 (2013). Gidon, A. et al. Potencials d'acció dendrítica i computació en la capa humana 2/3 neurones corticals. ciència 367, 83-87 (2020). Doron, M., Chindemi, G., Muller, E., Markram, H. & Segev, I. Timed synaptic inhibition shapes NMDA spikes, influencing local dendritic processing and global I/O properties of cortical neurons. , 1550–1561 (2017). Cell Rep. 21 Du, K. et al. Cell-type-specific inhibition of the dendritic plateau potential in striatal spiny projection neurons. , E7612–E7621 (2017). Proc. Natl Acad. Sci. USA 114 Smith, S. L., Smith, I. T., Branco, T. & Häusser, M. Dendritic spikes enhance stimulus selectivity in cortical neurons in vivo. , 115–120 (2013). Nature 503 Xu, N.-l et al. Integració dendrítica no lineal de les entrades sensorials i motores durant una tasca de detecció activa. Takahashi, N., Oertner, T. G., Hegemann, P. & Larkum, M. E. Active cortical dendrites modulate perception. , 1587–1590 (2016). Science 354 Sheffield, M. E. & Dombeck, D. A. Prevalència transitòria de calci a través de l'arbre dendrític prediu propietats de camp de lloc. Natura 517, 200–204 (2015). Markram, H. et al. Reconstruction and simulation of neocortical microcircuitry. , 456–492 (2015). Cell 163 Billeh, Y. N. et al. Systematic integration of structural and functional data into multi-scale models of mouse primary visual cortex. , 388–403 (2020). Neuron 106 Hjorth, J. et al. The microcircuits of striatum in silico. , 202000671 (2020). Proc. Natl Acad. Sci. USA 117 Guerguiev, J., Lillicrap, T. P. & Richards, B. A. Towards deep learning with segregated dendrites. , e22901 (2017). elife 6 Iyer, A. et al. Avoiding catastrophe: active dendrites enable multi-task learning in dynamic environments. , 846219 (2022). Front. Neurorobot. 16 Jones, I. S. & Kording, K. P. Pot una sola neurona resoldre problemes d'aprenentatge automàtic interessants mitjançant càlculs successius sobre el seu arbre dendrític? Bird, A. D., Jedlicka, P. & Cuntz, H. Dendritic normalisation improves learning in sparsely connected artificial neural networks. , e1009202 (2021). PLoS Comput. Biol. 17 Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. In (ICLR, 2015). 3rd International Conference on Learning Representations (ICLR) Papernot, N., McDaniel, P. & Goodfellow, I. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. Preprint at (2016). https://arxiv.org/abs/1605.07277 Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. , 2278–2324 (1998). Proc. IEEE 86 Xiao, H., Rasul, K. & Vollgraf, R. Moda-MNIST: un nou conjunt de dades d'imatge per a l'algorisme de benchmarking de l'aprenentatge automàtic. Preprint a http://arxiv.org/abs/1708.07747 (2017). Bartunov, S. et al. Avaluació de l'escalabilitat dels algoritmes i arquitectures d'aprenentatge profund biològicament motivats. In Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (NeurIPS, 2018). Rauber, J., Brendel, W. & Bethge, M. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. In (2017). Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning Rauber, J., Zimmermann, R., Bethge, M. & Brendel, W. Foolbox native: fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. , 2607 (2020). J. Open Source Softw. 5 Paszke, A. et al. PyTorch: An imperative style, high-performance deep learning library. In (NeurIPS, 2019). Advances in Neural Information Processing Systems 32 (NeurIPS 2019) He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 770–778 (IEEE, 2016). Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Reconeixement The authors sincerely thank Dr. Rita Zhang, Daochen Shi and members at NVIDIA for the valuable technical support of GPU computing. This work was supported by the National Key R&D Program of China (No. 2020AAA0130400) to K.D. and T.H., National Natural Science Foundation of China (No. 61088102) to T.H., National Key R&D Program of China (No. 2022ZD01163005) to L.M., Key Area R&D Program of Guangdong Province (No. 2018B030338001) to T.H., National Natural Science Foundation of China (No. 61825101) to Y.T., Swedish Research Council (VR-M-2020-01652), Swedish e-Science Research Centre (SeRC), EU/Horizon 2020 No. 945539 (HBP SGA3), and KTH Digital Futures to J.H.K., J.H., and A.K., Swedish Research Council (VR-M-2021-01995) and EU/Horizon 2020 no. 945539 (HBP SGA3) to S.G. and A.K. Part of the simulations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at PDC KTH partially funded by the Swedish Research Council through grant agreement no. 2018-05973. Aquest document està disponible en la naturalesa sota la llicència CC by 4.0 Deed (Attribution 4.0 International). This paper is under CC by 4.0 Deed (Attribution 4.0 International) license. available on nature