Gli scienziati hanno costruito un motore GPU che simula le cellule cerebrali 1.500 volte più veloce

Gli autori : Giuseppe Zhang Gano lui Lei Ma di Xiaofei Liu di J.J. Johannes Hjorth Alexander Kozlov Giuseppe lui Shenjian Zhang Jeanette Hellgren Kotaleski Yonghong Tian Stella griglia Quando tu di Tiejun Huang Gli autori : Giuseppe Zhang Gano lui Lei Ma di Xiaofei Liu di J.J. Johannes Hjorth Alessandro Kozlov Giuseppe lui di Shenjian Zhang di Jeanette Hellgren Kotaleski di Yonghong Tian Stella griglia Quando tu di Tiejun Huang astratto I modelli biofisicamente dettagliati multi-divisione sono strumenti potenti per esplorare i principi computazionali del cervello e anche servire come un quadro teorico per generare algoritmi per i sistemi di intelligenza artificiale (AI). Tuttavia, il costo di calcolo costoso limita gravemente le applicazioni sia nei campi della neuroscienza e AI. La principale lacuna durante la simulazione di modelli di compartimenti dettagliati è la capacità di un simulatore di risolvere grandi sistemi di equazioni lineari. Endrico Ierarchici Cheduling (DHS) metodo per accelerare notevolmente un tale processo. Proviamo teoricamente che l'implementazione DHS è computazionalmente ottimale e accurata. Questo metodo basato su GPU funziona a 2-3 ordini di magnitudo più veloce di quello del classico metodo seriale Hines nella piattaforma CPU convenzionale. Abbiamo costruito un framework DeepDendrite, che integra il metodo DHS e il motore di calcolo GPU del simulatore NEURON e dimostra le applicazioni di DeepDendrite nei compiti neuroscientifici. Abbiamo indagato su come i modelli spaziali delle entrate spinali influenzano l'eccitabilità neuronale in un modello neuronale piramidale umano dettagliato con 25.000 spin. Inoltre, forniamo una breve discussione sul potenziale di DeepD D H S Introduzione La decifrazione dei principi di codifica e di calcolo dei neuroni è essenziale per la neuroscienza. Il cervello dei mammiferi è composto da più di migliaia di diversi tipi di neuroni con proprietà morfologiche e biofisiche uniche. , in cui i neuroni sono stati considerati semplici unità di somma, è ancora ampiamente applicato nel calcolo neurale, specialmente nell'analisi delle reti neurali. Negli ultimi anni, l'intelligenza artificiale moderna (IA) ha utilizzato questo principio e sviluppato potenti strumenti, come le reti neurali artificiali (ANN) Tuttavia, oltre a computazioni complesse a livello di singolo neurone, i comparti subcellulari, come le dendrite neuronali, possono anche eseguire operazioni non lineari come unità computazionali indipendenti. , , , , Inoltre, le spine dendritiche, piccole protrusioni che coprono densamente le dendriti nei neuroni spinali, possono compartimentare i segnali sinaptici, permettendo loro di essere separati dai loro dendriti genitori ex vivo e in vivo. , , , . 1 2 3 4 5 6 7 8 9 10 11 Simulazioni utilizzando neuroni biologicamente dettagliati forniscono un quadro teorico per collegare i dettagli biologici ai principi computazionali. , ci consente di modellare i neuroni con morfologie dendritiche realistiche, conduttività ionica intrinseca e input sinaptici extrinsic.La spina dorsale del modello dettagliato multi-divisione, vale a dire, dendrites, è costruito sulla classica teoria del cavo , che modella le proprietà di membrana biofisica dei dendritti come cavi passivi, fornendo una descrizione matematica di come i segnali elettronici invadono e si propagano attraverso processi neuronali complessi. Incorporando la teoria dei cavi con meccanismi biofisici attivi come i canali ionici, correnti sinaptiche eccitatorie e inibitorie, ecc., un modello dettagliato multi-divisione può ottenere calcoli neuronali cellulari e subcellulari al di là delle limitazioni sperimentali , . 12 13 12 4 7 Oltre al suo profondo impatto sulla neuroscienza, recentemente sono stati utilizzati modelli neuronali biologicamente dettagliati per colmare il divario tra i dettagli strutturali e biofisici neuronali e l'IA. La tecnica prevalente nel campo dell'IA moderno è ANN costituita da neuroni punti, un analogo alle reti neuronali biologiche. Anche se ANN con l'algoritmo di "backpropagation-of-error" (backprop) ha raggiunto prestazioni notevoli in applicazioni specializzate, battendo anche i migliori giocatori professionisti umani nei giochi di Go e scacchi , Il cervello umano supera ancora gli ANN nei domini che coinvolgono ambienti più dinamici e rumorosi. , Recenti studi teorici suggeriscono che l'integrazione dendritica è cruciale per generare algoritmi di apprendimento efficienti che potenzialmente superano il backprop nel trattamento parallelo delle informazioni. , , Inoltre, un singolo modello dettagliato multi-divisione può imparare i calcoli non lineari a livello di rete per i neuroni del punto regolando solo la forza sinaptica. , Per questo motivo, è di elevata priorità espandere i paradigmi nell'IA simile al cervello da singoli modelli neuronali dettagliati a reti biologicamente dettagliate su larga scala. 14 15 16 17 18 19 20 21 22 Una sfida di lunga data dell'approccio di simulazione dettagliata risiede nel suo costo computazionale estremamente alto, che ha severamente limitato la sua applicazione alla neuroscienza e all'IA. Il principale ostacolo della simulazione è risolvere equazioni lineari basate sulle teorie fondamentali della modellazione dettagliata. , , Per migliorare l'efficienza, il metodo classico Hines riduce la complessità del tempo per risolvere le equazioni da O(n3) a O(n), che è stato ampiamente applicato come algoritmo di base in simulatori popolari come NEURON La Genesi . However, this method uses a serial approach to process each compartment sequentially. When a simulation involves multiple biophysically detailed dendrites with dendritic spines, the linear equation matrix (“Hines Matrix”) scales accordingly with an increasing number of dendrites or spines (Fig. ), rendendo il metodo Hines non più pratico, poiché costituisce un onere molto pesante su tutta la simulazione. 12 23 24 25 26 1e Un modello di neurone piramidale strato-5 ricostruito e la formula matematica utilizzata con modelli di neuroni dettagliati. Il flusso di lavoro quando si simulano numericamente modelli neuronali dettagliati.La fase di risoluzione delle equazioni è la barriera nella simulazione. Un esempio di equazioni lineari nella simulazione. Data dependency of the Hines method when solving linear equations in di . Il numero di sistemi di equazioni lineari da risolvere subisce un aumento significativo quando i modelli stanno crescendo più dettagliati. Costo computazionale (passi presi nella fase di risoluzione delle equazioni) del metodo Hines in serie su diversi tipi di modelli neuronali. Illustrazione di diversi metodi di risoluzione. Diverse parti di un neurone sono assegnate a più unità di elaborazione in metodi paralleli (centro, destra), mostrate con colori diversi. Il costo dei tre metodi in quando si risolvono equazioni di un modello piramidale con spine. Il tempo di esecuzione indica il consumo di tempo della simulazione di 1 s (risolvendo l'equazione 40.000 volte con un passo di tempo di 0,025 ms). metodo parallelo p-Hines in CoreNEURON (su GPU), metodo parallelo a base di ramo a base di ramo (su GPU), metodo di pianificazione gerarchica dendritica DHS (su GPU). a b c d c e f g h g i Negli ultimi decenni, sono stati compiuti enormi progressi per accelerare il metodo Hines utilizzando metodi paralleli a livello cellulare, che consentono di parallelizzare il calcolo delle diverse parti in ogni cellula. , , , , , Tuttavia, gli attuali metodi paralleli a livello cellulare spesso mancano di una strategia di parallelizzazione efficiente o mancano di sufficiente precisione numerica rispetto al metodo originale di Hines. 27 28 29 30 31 32 Qui, sviluppiamo uno strumento di simulazione completamente automatico, numericamente accurato e ottimizzato che può accelerare in modo significativo l'efficienza del calcolo e ridurre i costi del calcolo. Inoltre, questo strumento di simulazione può essere adottato senza problemi per stabilire e testare reti neurali con dettagli biologici per applicazioni di apprendimento automatico e AI. Criticamente, formulamo il calcolo parallelo del metodo Hines come un problema di programmazione matematica e generamo un metodo di programmazione gerarchica dendritica (DHS) basato sull'ottimizzazione combinatoria Teoria del calcolo parallelo Dimostriamo che il nostro algoritmo fornisce la pianificazione ottimale senza perdita di precisione. Inoltre, abbiamo ottimizzato DHS per il chip GPU attualmente più avanzato sfruttando la gerarchia della memoria GPU e i meccanismi di accesso alla memoria. ) rispetto al classico simulatore NEURON mantenendo la stessa precisione. 33 34 1 25 Per consentire simulazioni dendritiche dettagliate per l'uso nell'IA, abbiamo stabilito il framework DeepDendrite integrando la piattaforma CoreNEURON (un motore di calcolo ottimizzato per NEURON) incorporata in DHS. come motore di simulazione e due moduli ausiliari (modulo I/O e modulo di apprendimento) che supportano gli algoritmi di apprendimento dendritico durante le simulazioni. 35 Ultimo ma non meno importante, presentiamo anche diverse applicazioni utilizzando DeepDendrite, mirando a alcune sfide critiche nella neuroscienza e nell'IA: (1) Dimostriamo come i modelli spaziali delle entrate della colonna dendritica influenzano le attività neuronali con neuroni contenenti spine in tutto l'albero dendritico (modelli di colonna piena). DeepDendrite ci consente di esplorare il calcolo neuronale in un modello di neurone piramidale umana simulato con ~25.000 spine dendritiche. (2) Nella discussione consideriamo anche il potenziale di DeepDendrite nel contesto dell'IA, in particolare, nella creazione di ANN con neuroni piramidali umani morfologicamente dettagliati. Tutto il codice sorgente per DeepDendrite, i modelli full-spine e il modello di rete dendritica dettagliato sono disponibili pubblicamente online (vedi Code Availability).Il nostro framework di apprendimento open-source può essere facilmente integrato con altre regole di apprendimento dendritico, come le regole di apprendimento per dendritiche non lineari (full-active) Plasticità sinaptica burst-dependent , e imparare con la previsione di spike Nel complesso, il nostro studio fornisce un insieme completo di strumenti che hanno il potenziale di cambiare l'attuale ecosistema della comunità di neuroscienze computazionali. sfruttando la potenza del calcolo GPU, ci aspettiamo che questi strumenti faciliteranno le esplorazioni a livello di sistema dei principi computazionali delle strutture fine del cervello, nonché promuovano l'interazione tra la neuroscienza e l'IA moderna. 21 20 36 Risultati Dendritic Hierarchical Scheduling (DHS) method Il calcolo delle correnti ioniche e la risoluzione delle equazioni lineari sono due fasi critiche quando si simulano neuroni biofisicamente dettagliati, che richiedono tempo e presentano gravi oneri computazionali. Fortunatamente, il calcolo delle correnti ioniche di ciascun comparto è un processo completamente indipendente in modo che possa essere naturalmente parallelizzato su dispositivi con massicce unità di calcolo parallelo come le GPU. Come conseguenza, la soluzione di equazioni lineari diventa il bottleneck rimanente per il processo di parallelizzazione (Fig. e) il 37 1a F Per affrontare questa lacuna, sono stati sviluppati metodi paralleli a livello cellulare, che accelerano il calcolo a singola cellula "dividendo" una singola cellula in diversi compartimenti che possono essere calcolati in parallelo. , , Tuttavia, tali metodi si basano fortemente su conoscenze precedenti per generare strategie pratiche su come dividere un singolo neurone in compartimenti (Fig. • Fig. supplementare Di conseguenza, diventa meno efficiente per i neuroni con morfologie asimmetriche, ad esempio, i neuroni piramidali e i neuroni di Purkinje. 27 28 38 1g di 1 Il nostro obiettivo è quello di sviluppare un metodo parallelo più efficiente e preciso per la simulazione di reti neurali biologicamente dettagliate. In primo luogo, stabiliamo i criteri per l'accuratezza di un metodo parallelo a livello cellulare. , proponiamo tre condizioni per garantire che un metodo parallelo produca soluzioni identiche come il metodo di calcolo serial Hines in base alla dipendenza dei dati nel metodo Hines (vedere Metodi). 34 Sulla base della precisione della simulazione e del costo computazionale, formulamo il problema di parallelizzazione come un problema di programmazione matematica (vedere Metodi). in parallelo, possiamo calcolare al massimo I nodi sono stati elaborati in ogni fase, ma dobbiamo garantire che un nodo sia calcolato solo se tutti i suoi nodi sono stati elaborati; il nostro obiettivo è trovare una strategia con il numero minimo di passi per l'intera procedura. k k Per generare una partizione ottimale, proponiamo un metodo chiamato Dendritic Hierarchical Scheduling (DHS) (la prova teorica è presentata nei Metodi). Il metodo DHS comprende due passi: analizzare la topologia dendritica e trovare la migliore partizione: (1) Date un modello dettagliato, otteniamo prima il suo albero di dipendenza corrispondente e calcoliamo la profondità di ciascun nodo (la profondità di un nodo è il numero dei suoi nodi antenati) sull'albero (Fig. (2) Dopo l'analisi topologica, cerchiamo i candidati e scegliamo al massimo Nodi candidati più profondi (un nodo è un candidato solo se tutti i suoi nodi figli sono stati elaborati). e) il 2a 2b e c k 2d Flusso di lavoro DHS. I nodi candidati più profondi di ogni iterazione. Illustrazione del calcolo della profondità del nodo di un modello di compartimento. Il modello viene prima convertito in una struttura di albero, quindi viene calcolata la profondità di ciascun nodo. I colori indicano valori di profondità diversi. Analisi topologica su diversi modelli di neuroni. sei neuroni con morfologie distinte sono mostrati qui. Per ogni modello, la soma è selezionata come radice dell'albero in modo che la profondità del nodo aumenta dalla soma (0) ai dendrit distali. Illustrazione dell'esecuzione di DHS sul modello in con quattro fili. Candidati: nodi che possono essere elaborati. Candidati selezionati: nodi che sono selezionati dal DHS, vale a dire, il Nodi processati: nodi che sono stati processati prima. Strategia di parallelizzazione ottenuta dal DHS dopo il processo in Ogni nodo è assegnato a uno dei quattro fili paralleli.DHS riduce i passaggi di elaborazione dei nodi seriali da 14 a 5 distribuendo i nodi a fili multipli. Relative cost, i.e., the proportion of the computational cost of DHS to that of the serial Hines method, when applying DHS with different numbers of threads on different types of models. a k b c d b k e d f Prendiamo un modello semplificato con 15 compartimenti come esempio, utilizzando il metodo di calcolo serial Hines, ci vogliono 14 passi per elaborare tutti i nodi, mentre utilizzando DHS con quattro unità parallele può partizionare i suoi nodi in cinque sottoinsiemi (Fig. Poiché i nodi nello stesso sottoinsieme possono essere elaborati in parallelo, occorre solo cinque passaggi per elaborare tutti i nodi utilizzando DHS (Fig. e) il 2d 2e Successivamente, applichiamo il metodo DHS a sei modelli di neuroni dettagliati rappresentativi (selezionati da ModelDB). con diversi numeri di thread (Fig. • Neuroni piramidali corticali e ipocampali , , Neuroni cerebrali di Purkinje neuroni di proiezione striatale (SPN) ), e le cellule mitrali olfattive del bulbo , che copre i principali neuroni principali nelle aree sensoriali, corticali e subcorticali. Abbiamo quindi misurato il costo di calcolo. Il costo di calcolo relativo qui è definito dalla proporzione del costo di calcolo di DHS a quello del metodo Hines seriale. Il costo di calcolo, cioè il numero di passi presi per risolvere le equazioni, diminuisce drasticamente con l'aumento del numero di fili. Ad esempio, con 16 fili, il costo di calcolo di DHS è del 7%-10% rispetto al metodo Hines seriale. Intrigantemente, il metodo DHS raggiunge i confini inferiori del loro costo di calcolo per i neuroni presentati quando viene dato 16 o anche 8 fili paralleli (Fig. ), suggerendo che l'aggiunta di più fili non migliora ulteriormente le prestazioni a causa delle dipendenze tra i compartimenti. 39 2f 40 41 42 43 44 45 2f Insieme, abbiamo generato un metodo DHS che consente l'analisi automatizzata della topologia dendritica e la partizione ottimale per il calcolo parallelo. Vale la pena notare che DHS trova la partizione ottimale prima di iniziare la simulazione e non è necessario alcun calcolo aggiuntivo per risolvere le equazioni. Accelerazione del DHS tramite GPU Memory Boosting DHS calcola ogni neurone con più fili, che consuma una grande quantità di fili durante l'esecuzione di simulazioni di rete neurale. Graphics Processing Units (GPUs) sono costituiti da unità di elaborazione massicce (cioè, processori di streaming, SPs, FIG. per il computer parallelo In teoria, molti SP sulla GPU dovrebbero supportare la simulazione efficiente per reti neurali su larga scala (Fig. Tuttavia, abbiamo osservato costantemente che l'efficienza del DHS è diminuita significativamente quando la dimensione della rete è cresciuta, il che potrebbe essere il risultato di archiviazione di dati dispersa o accesso alla memoria aggiuntiva causata dal caricamento e dalla scrittura di risultati intermedi (Fig. e di sinistra). 3a e b 46 3c 3d L'architettura della GPU e la sua gerarchia di memoria. Ogni GPU contiene unità di elaborazione massicce (processori di flusso). Architettura dei multiprocessori di streaming (SM) Ogni SM contiene più processori di streaming, registri e cache L1. Applicando DHS su due neuroni, ciascuno con quattro fili. Durante la simulazione, ogni filo esegue su un processore di flusso. Strategia di ottimizzazione della memoria sulla GPU. pannello superiore, assegnazione del filo e archiviazione dei dati del DHS, prima (sinistra) e dopo (destra) il potenziamento della memoria. In fondo, un esempio di un singolo passo nella triangularizzazione quando si simulano due neuroni in I processori inviano una richiesta di dati per caricare i dati per ciascun filo dalla memoria globale. Senza il boosting della memoria (a sinistra), ci vogliono sette transazioni per caricare tutti i dati della richiesta e alcune transazioni aggiuntive per i risultati intermedi. Con il boosting della memoria (a destra), ci vogliono solo due transazioni per caricare tutti i dati della richiesta, i registri vengono utilizzati per i risultati intermedi, che migliorano ulteriormente il throughput della memoria. Il tempo di esecuzione di DHS (32 fili per cellula) con e senza memoria stimolante su modelli piramidali a più strati 5 con spin. Accelerazione del potenziamento della memoria su modelli piramidali a più strati 5 con spin. a b c d d e f Risolviamo questo problema aumentando la memoria della GPU, un metodo per aumentare il flusso di memoria sfruttando la gerarchia di memoria e il meccanismo di accesso della GPU. Basato sul meccanismo di caricamento della memoria della GPU, i filamenti successivi che caricano i dati allineati e successivamente memorizzati portano ad un alto flusso di memoria rispetto all'accesso ai dati memorizzati in scatter, il che riduce il flusso di memoria. , Per ottenere un elevato throughput, alliniamo prima gli ordini di calcolo dei nodi e riorganizziamo i fili in base al numero di nodi su di essi. Quindi permutiamo la memorizzazione dei dati nella memoria globale, coerente con gli ordini di calcolo, cioè i nodi che vengono elaborati nello stesso passo vengono memorizzati successivamente nella memoria globale. Inoltre, utilizziamo i registri GPU per memorizzare i risultati intermedi, rafforzando ulteriormente il throughput della memoria. L'esempio mostra che il boosting della memoria richiede solo due transazioni di memoria per caricare otto dati di richiesta (Fig. Inoltre, esperimenti su più numeri di neuroni piramidali con spine e i tipici modelli di neuroni (Fig. • Fig. supplementare ) mostrano che il potenziamento della memoria raggiunge una accelerazione di 1,2-3,8 volte rispetto al DHS ingenuo. 46 47 3d 3 e f 2 Per testare in modo completo le prestazioni del DHS con il boosting della memoria GPU, selezioniamo sei tipici modelli di neuroni e valutiamo il tempo di esecuzione per risolvere le equazioni del cavo su numeri massicci di ciascun modello (Fig. Abbiamo esaminato DHS con quattro fili (DHS-4) e sedici fili (DHS-16) per ciascun neurone, rispettivamente. Rispetto al metodo GPU in CoreNEURON, DHS-4 e DHS-16 possono accelerare circa 5 e 15 volte, rispettivamente (Fig. Inoltre, rispetto al metodo convenzionale Hines in serie in NEURON in esecuzione con un singolo filo di CPU, DHS accelera la simulazione di 2-3 ordini di magnitudo (Fig. supplementare. ), pur mantenendo la stessa precisione numerica in presenza di spine dense (Fig supplementari. e ), dendriti attivi (Fig. e diverse strategie di segmentazione (Fig. e) il 4 4a 3 4 8 7 7 Run time of solving equations for a 1 s simulation on GPU (dt = 0.025 ms, 40,000 iterations in total). CoreNEURON: the parallel method used in CoreNEURON; DHS-4: DHS with four threads for each neuron; DHS-16: DHS with 16 threads for each neuron. di Visualizzazione della partizione da DHS-4 e DHS-16, ogni colore indica un singolo filo. a b c DHS crea una partizione ottimale specifica per il tipo di cellula To gain insights into the working mechanism of the DHS method, we visualized the partitioning process by mapping compartments to each thread (every color presents a single thread in Fig. La visualizzazione mostra che un singolo filo commuta frequentemente tra diversi rami (Fig. Curiosamente, il DHS genera partizioni allineate nei neuroni morfologicamente simmetrici come il neurone di proiezione striatale (SPN) e la cellula mitrale (Fig. In contrasto, genera partizioni frammentate di neuroni morfologicamente asimmetrici come i neuroni piramidali e la cellula di Purkinje (Fig. ), indicando che il DHS divide l'albero neurale su scala di singoli compartimenti (cioè, nodo di albero) piuttosto che su scala di ramo. Questa partizione a grano fine specifica per tipo di cellula consente al DHS di sfruttare appieno tutti i fili disponibili. 4b e c 4b e c 4b e c 4b e c In sintesi, il DHS e il boosting della memoria generano una soluzione ottimale teoricamente provata per risolvere equazioni lineari in parallelo con un'efficienza senza precedenti. Utilizzando questo principio, abbiamo costruito la piattaforma DeepDendrite ad accesso aperto, che può essere utilizzata dai neuroscienziati per implementare modelli senza alcuna conoscenza specifica della programmazione GPU. Di seguito, dimostriamo come possiamo utilizzare DeepDendrite nelle attività neuroscientifiche. DHS consente la modellazione a livello spinale Poiché le spine dendritiche ricevono la maggior parte dell'ingresso eccitatorio ai neuroni piramidali corticali e ipocampali, ai neuroni di proiezione striatale, ecc., le loro morfologie e la loro plasticità sono cruciali per la regolazione dell'eccitabilità neuronale. , , , , Tuttavia, le spine sono troppo piccole (~ 1 μm di lunghezza) per essere misurate direttamente sperimentalmente in relazione ai processi dipendenti dalla tensione. 10 48 49 50 51 Possiamo modellare una singola colonna vertebrale con due compartimenti: la testa della colonna vertebrale dove si trovano le sinapsi e il collo della colonna vertebrale che collega la testa della colonna vertebrale alle dendrite. La teoria prevede che il collo della colonna vertebrale molto sottile (0,1-0,5 um di diametro) isola elettronicamente la testa della colonna vertebrale dalla sua dendrite parentale, compartimentando così i segnali generati alla testa della colonna vertebrale. Tuttavia, il modello dettagliato con spine pienamente distribuite sulle dendrites (“modello di spine piena”) è computazionalmente molto costoso. Una soluzione di compromissione comune è quella di modificare la capacità e la resistenza della membrana da un Spine fattore , invece di modellare tutte le spine esplicitamente. qui, il fattore spinale mira ad approssimare l'effetto spinale sulle proprietà biofisiche della membrana cellulare . 52 53 F 54 F 54 Inspired by the previous work of Eyal et al. , we investigated how different spatial patterns of excitatory inputs formed on dendritic spines shape neuronal activities in a human pyramidal neuron model with explicitly modeled spines (Fig. ). Noticeably, Eyal et al. employed the fattore spinale per incorporare spine in dendrites mentre solo pochi spine attivate sono stati esplicitamente attaccati a dendrites (“modello di poche spine” in Figura. ). The value of spine in their model was computed from the dendritic area and spine area in the reconstructed data. Accordingly, we calculated the spine density from their reconstructed data to make our full-spine model more consistent with Eyal’s few-spine model. With the spine density set to 1.3 μm-1, the pyramidal neuron model contained about 25,000 spines without altering the model’s original morphological and biophysical properties. Further, we repeated the previous experiment protocols with both full-spine and few-spine models. We use the same synaptic input as in Eyal’s work but attach extra background noise to each sample. By comparing the somatic traces (Fig. ) e spike probabilità (Fig. ) in full-spine and few-spine models, we found that the full-spine model is much leakier than the few-spine model. In addition, the spike probability triggered by the activation of clustered spines appeared to be more nonlinear in the full-spine model (the solid blue line in Fig. ) than in the few-spine model (the dashed blue line in Fig. Questi risultati indicano che il metodo convenzionale F-fattore può sottovalutare l'impatto della colonna vertebrale densa sui calcoli di eccitabilità dendritica e non linearità. 51 5a F 5a F 5b, c 5d 5d 5d Experiment setup. We examine two major types of models: few-spine models and full-spine models. Few-spine models (two on the left) are the models that incorporated spine area globally into dendrites and only attach individual spines together with activated synapses. In full-spine models (two on the right), all spines are explicitly attached over whole dendrites. We explore the effects of clustered and randomly distributed synaptic inputs on the few-spine models and the full-spine models, respectively. Somatic voltages recorded for cases in . Colors of the voltage curves correspond to , scale bar: 20 ms, 20 mV. Color-coded voltages during the simulation in at specific times. Colors indicate the magnitude of voltage. Somatic spike probability as a function of the number of simultaneously activated synapses (as in Eyal et al.’s work) for four cases in . Background noise is attached. Run time of experiments in with different simulation methods. NEURON: conventional NEURON simulator running on a single CPU core. CoreNEURON: CoreNEURON simulator on a single GPU. DeepDendrite: DeepDendrite on a single GPU. a b a a c b d a e d In the DeepDendrite platform, both full-spine and few-spine models achieved 8 times speedup compared to CoreNEURON on the GPU platform and 100 times speedup compared to serial NEURON on the CPU platform (Fig. ; Supplementary Table ) while keeping the identical simulation results (Supplementary Figs. e ). Therefore, the DHS method enables explorations of dendritic excitability under more realistic anatomic conditions. 5e 1 4 8 Discussion In this work, we propose the DHS method to parallelize the computation of Hines method and we mathematically demonstrate that the DHS provides an optimal solution without any loss of precision. Next, we implement DHS on the GPU hardware platform and use GPU memory boosting techniques to refine the DHS (Fig. ). When simulating a large number of neurons with complex morphologies, DHS with memory boosting achieves a 15-fold speedup (Supplementary Table ) as compared to the GPU method used in CoreNEURON and up to 1,500-fold speedup compared to serial Hines method in the CPU platform (Fig. ; Supplementary Fig. and Supplementary Table ). Furthermore, we develop the GPU-based DeepDendrite framework by integrating DHS into CoreNEURON. Finally, as a demonstration of the capacity of DeepDendrite, we present a representative application: examine spine computations in a detailed pyramidal neuron model with 25,000 spines. Further in this section, we elaborate on how we have expanded the DeepDendrite framework to enable efficient training of biophysically detailed neural networks. To explore the hypothesis that dendrites improve robustness against adversarial attacks , we train our network on typical image classification tasks. We show that DeepDendrite can support both neuroscience simulations and AI-related detailed neural network tasks with unprecedented speed, therefore significantly promoting detailed neuroscience simulations and potentially for future AI explorations. 55 3 1 4 3 1 56 Decades of efforts have been invested in speeding up the Hines method with parallel methods. Early work mainly focuses on network-level parallelization. In network simulations, each cell independently solves its corresponding linear equations with the Hines method. Network-level parallel methods distribute a network on multiple threads and parallelize the computation of each cell group with each thread , . With network-level methods, we can simulate detailed networks on clusters or supercomputers . In recent years, GPU has been used for detailed network simulation. Because the GPU contains massive computing units, one thread is usually assigned one cell rather than a cell group , , . With further optimization, GPU-based methods achieve much higher efficiency in network simulation. However, the computation inside the cells is still serial in network-level methods, so they still cannot deal with the problem when the “Hines matrix” of each cell scales large. 57 58 59 35 60 61 Cellular-level parallel methods further parallelize the computation inside each cell. The main idea of cellular-level parallel methods is to split each cell into several sub-blocks and parallelize the computation of those sub-blocks , . However, typical cellular-level methods (e.g., the “multi-split” method ) pay less attention to the parallelization strategy. The lack of a fine parallelization strategy results in unsatisfactory performance. To achieve higher efficiency, some studies try to obtain finer-grained parallelization by introducing extra computation operations , , or making approximations on some crucial compartments, while solving linear equations , . These finer-grained parallelization strategies can get higher efficiency but lack sufficient numerical accuracy as in the original Hines method. 27 28 28 29 38 62 63 64 Unlike previous methods, DHS adopts the finest-grained parallelization strategy, i.e., compartment-level parallelization. By modeling the problem of “how to parallelize” as a combinatorial optimization problem, DHS provides an optimal compartment-level parallelization strategy. Moreover, DHS does not introduce any extra operation or value approximation, so it achieves the lowest computational cost and retains sufficient numerical accuracy as in the original Hines method at the same time. Dendritic spines are the most abundant microstructures in the brain for projection neurons in the cortex, hippocampus, cerebellum, and basal ganglia. As spines receive most of the excitatory inputs in the central nervous system, electrical signals generated by spines are the main driving force for large-scale neuronal activities in the forebrain and cerebellum , . The structure of the spine, with an enlarged spine head and a very thin spine neck—leads to surprisingly high input impedance at the spine head, which could be up to 500 MΩ, combining experimental data and the detailed compartment modeling approach , . Due to such high input impedance, a single synaptic input can evoke a “gigantic” EPSP ( ~ 20 mV) at the spine-head level , , thereby boosting NMDA currents and ion channel currents in the spine . However, in the classic single detailed compartment models, all spines are replaced by the coefficient modifying the dendritic cable geometries . This approach may compensate for the leak currents and capacitance currents for spines. Still, it cannot reproduce the high input impedance at the spine head, which may weaken excitatory synaptic inputs, particularly NMDA currents, thereby reducing the nonlinearity in the neuron’s input-output curve. Our modeling results are in line with this interpretation. 10 11 48 65 48 66 11 F 54 On the other hand, the spine’s electrical compartmentalization is always accompanied by the biochemical compartmentalization , , , resulting in a drastic increase of internal [Ca2+], within the spine and a cascade of molecular processes involving synaptic plasticity of importance for learning and memory. Intriguingly, the biochemical process triggered by learning, in turn, remodels the spine’s morphology, enlarging (or shrinking) the spine head, or elongating (or shortening) the spine neck, which significantly alters the spine’s electrical capacity , , , . Such experience-dependent changes in spine morphology also referred to as “structural plasticity”, have been widely observed in the visual cortex , , somatosensory cortex , , motor cortex , hippocampus , and the basal ganglia in vivo. They play a critical role in motor and spatial learning as well as memory formation. However, due to the computational costs, nearly all detailed network models exploit the “F-factor” approach to replace actual spines, and are thus unable to explore the spine functions at the system level. By taking advantage of our framework and the GPU platform, we can run a few thousand detailed neurons models, each with tens of thousands of spines on a single GPU, while maintaining ~100 times faster than the traditional serial method on a single CPU (Fig. ). Therefore, it enables us to explore of structural plasticity in large-scale circuit models across diverse brain regions. 8 52 67 67 68 69 70 71 72 73 74 75 9 76 5e Another critical issue is how to link dendrites to brain functions at the systems/network level. It has been well established that dendrites can perform comprehensive computations on synaptic inputs due to enriched ion channels and local biophysical membrane properties , , . For example, cortical pyramidal neurons can carry out sublinear synaptic integration at the proximal dendrite but progressively shift to supralinear integration at the distal dendrite Inoltre, le dendrites distali possono produrre eventi rigenerativi come i picchi di sodio dendritico, i picchi di calcio e i potenziali di picchi/piano NMDA. , . Such dendritic events are widely observed in mice or even human cortical neurons in vitro, which may offer various logical operations , or gating functions , . Recently, in vivo recordings in awake or behaving mice provide strong evidence that dendritic spikes/plateau potentials are crucial for orientation selectivity in the visual cortex , sensory-motor integration in the whisker system , , and spatial navigation in the hippocampal CA1 region . 5 6 7 77 6 78 6 79 6 79 80 81 82 83 84 85 To establish the causal link between dendrites and animal (including human) patterns of behavior, large-scale biophysically detailed neural circuit models are a powerful computational tool to realize this mission. However, running a large-scale detailed circuit model of 10,000-100,000 neurons generally requires the computing power of supercomputers. It is even more challenging to optimize such models for in vivo data, as it needs iterative simulations of the models. The DeepDendrite framework can directly support many state-of-the-art large-scale circuit models , , , which were initially developed based on NEURON. Moreover, using our framework, a single GPU card such as Tesla A100 could easily support the operation of detailed circuit models of up to 10,000 neurons, thereby providing carbon-efficient and affordable plans for ordinary labs to develop and optimize their own large-scale detailed models. 86 87 88 Recent works on unraveling the dendritic roles in task-specific learning have achieved remarkable results in two directions, i.e., solving challenging tasks such as image classification dataset ImageNet with simplified dendritic networks , and exploring full learning potentials on more realistic neuron , Tuttavia, c'è un compromesso tra le dimensioni del modello e i dettagli biologici, poiché l'aumento della scala della rete è spesso sacrificato per la complessità a livello neuronale. , , . Moreover, more detailed neuron models are less mathematically tractable and computationally expensive . 20 21 22 19 20 89 21 There has also been progress in the role of active dendrites in ANNs for computer vision tasks. Iyer et al. . proposed a novel ANN architecture with active dendrites, demonstrating competitive results in multi-task and continual learning. Jones and Kording used a binary tree to approximate dendrite branching and provided valuable insights into the influence of tree structure on single neurons’ computational capacity. Bird et al. . proposed a dendritic normalization rule based on biophysical behavior, offering an interesting perspective on the contribution of dendritic arbor structure to computation. While these studies offer valuable insights, they primarily rely on abstractions derived from spatially extended neurons, and do not fully exploit the detailed biological properties and spatial information of dendrites. Further investigation is needed to unveil the potential of leveraging more realistic neuron models for understanding the shared mechanisms underlying brain computation and deep learning. 90 91 92 In response to these challenges, we developed DeepDendrite, a tool that uses the Dendritic Hierarchical Scheduling (DHS) method to significantly reduce computational costs and incorporates an I/O module and a learning module to handle large datasets. With DeepDendrite, we successfully implemented a three-layer hybrid neural network, the Human Pyramidal Cell Network (HPC-Net) (Fig. ). This network demonstrated efficient training capabilities in image classification tasks, achieving approximately 25 times speedup compared to training on a traditional CPU-based platform (Fig. ; Supplementary Table ). 6a, b 6f 1 The illustration of the Human Pyramidal Cell Network (HPC-Net) for image classification. Images are transformed to spike trains and fed into the network model. Learning is triggered by error signals propagated from soma to dendrites. Training with mini-batch. Multiple networks are simulated simultaneously with different images as inputs. The total weight updates ΔW are computed as the average of ΔWi from each network. Comparison of the HPC-Net before and after training. Left, the visualization of hidden neuron responses to a specific input before (top) and after (bottom) training. Right, hidden layer weights (from input to hidden layer) distribution before (top) and after (bottom) training. Workflow of the transfer adversarial attack experiment. We first generate adversarial samples of the test set on a 20-layer ResNet. Then use these adversarial samples (noisy images) to test the classification accuracy of models trained with clean images. Prediction accuracy of each model on adversarial samples after training 30 epochs on MNIST (left) and Fashion-MNIST (right) datasets. Run time of training and testing for the HPC-Net. The batch size is set to 16. Left, run time of training one epoch. Right, run time of testing. Parallel NEURON + Python: training and testing on a single CPU with multiple cores, using 40-process-parallel NEURON to simulate the HPC-Net and extra Python code to support mini-batch training. DeepDendrite: training and testing the HPC-Net on a single GPU with DeepDendrite. a b c d e f Additionally, it is widely recognized that the performance of Artificial Neural Networks (ANNs) can be undermined by adversarial attacks - perturbazioni deliberatamente progettate progettate per ingannare gli ANN. Intrigantemente, un'ipotesi esistente suggerisce che le dendrites e le sinapsi possano difendersi innattivamente da tali attacchi . Our experimental results utilizing HPC-Net lend support to this hypothesis, as we observed that networks endowed with detailed dendritic structures demonstrated some increased resilience to transfer adversarial attacks compared to standard ANNs, as evident in MNIST and Fashion-MNIST datasets (Fig. ). This evidence implies that the inherent biophysical properties of dendrites could be pivotal in augmenting the robustness of ANNs against adversarial interference. Nonetheless, it is essential to conduct further studies to validate these findings using more challenging datasets such as ImageNet . 93 56 94 95 96 6d, e 97 In conclusion, DeepDendrite has shown remarkable potential in image classification tasks, opening up a world of exciting future directions and possibilities. To further advance DeepDendrite and the application of biologically detailed dendritic models in AI tasks, we may focus on developing multi-GPU systems and exploring applications in other domains, such as Natural Language Processing (NLP), where dendritic filtering properties align well with the inherently noisy and ambiguous nature of human language. Challenges include testing scalability in larger-scale problems, understanding performance across various tasks and domains, and addressing the computational complexity introduced by novel biological principles, such as active dendrites. By overcoming these limitations, we can further advance the understanding and capabilities of biophysically detailed dendritic neural networks, potentially uncovering new advantages, enhancing their robustness against adversarial attacks and noisy inputs, and ultimately bridging the gap between neuroscience and modern AI. Methods Simulazione con DHS CoreNEURON simulator ( Usare il neurone architecture and is optimized for both memory usage and computational speed. We implement our Dendritic Hierarchical Scheduling (DHS) method in the CoreNEURON environment by modifying its source code. All models that can be simulated on GPU with CoreNEURON can also be simulated with DHS by executing the following command: 35 https://github.com/BlueBrain/CoreNeuron 25 coreneuron_exec -d /path/to/models -e time --cell-permute 3 --cell-nthread 16 --gpu The usage options are as in Table . 1 Accuracy of the simulation using cellular-level parallel computation To ensure the accuracy of the simulation, we first need to define the correctness of a cellular-level parallel algorithm to judge whether it will generate identical solutions compared with the proven correct serial methods, like the Hines method used in the NEURON simulation platform. Based on the theories in parallel computing , a parallel algorithm will yield an identical result as its corresponding serial algorithm, if and only if the data process order in the parallel algorithm is consistent with data dependency in the serial method. The Hines method has two symmetrical phases: triangularization and back-substitution. By analyzing the serial computing Hines method , we find that its data dependency can be formulated as a tree structure, where the nodes on the tree represent the compartments of the detailed neuron model. In the triangularization process, the value of each node depends on its children nodes. In contrast, during the back-substitution process, the value of each node is dependent on its parent node (Fig. ). Thus, we can compute nodes on different branches in parallel as their values are not dependent. 34 55 1d Based on the data dependency of the serial computing Hines method, we propose three conditions to make sure a parallel method will yield identical solutions as the serial computing Hines method: (1) The tree morphology and initial values of all nodes are identical to those in the serial computing Hines method; (2) In the triangularization phase, a node can be processed if and only if all its children nodes are already processed; (3) In the back-substitution phase, a node can be processed only if its parent node is already processed. Once a parallel computing method satisfies these three conditions, it will produce identical solutions as the serial computing method. Computational cost of cellular-level parallel computing method To theoretically evaluate the run time, i.e., efficiency, of the serial and parallel computing methods, we introduce and formulate the concept of computational cost as follows: given a tree and threads (basic computational units) to perform triangularization, parallel triangularization equals to divide the node set of into subsets, i.e., = { , , ... } where the size of each subset | | ≤ , i.e., at most nodes can be processed each step since there are only threads. The process of the triangularization phase follows the order: di → → → → , e nodi nello stesso sottoinsieme can be processed in parallel. So, we define | | (the size of set , i.e., here) as the computational cost of the parallel computing method. In short, we define the computational cost of a parallel method as the number of steps it takes in the triangularization phase. Because the back-substitution is symmetrical with triangularization, the total cost of the entire solving equation phase is twice that of the triangularization phase. T k V T n V V1 V2 Vn Vi k k k V1 di V2 Vn Vi V V n Mathematical scheduling problem Based on the simulation accuracy and computational cost, we formulate the parallelization problem as a mathematical scheduling problem: Con un albero = il , e un integer positivo , where è il node-set e is the edge set. Define partition ( ) = { , , … }, | | ≤ , 1 ≤ ≤ n, where | | indicates the cardinal number of subset , i.e., the number of nodes in , and for each node ∈ , all its children nodes { | per i bambini ( )} must in a previous subset , where 1 ≤ < . Our goal is to find an optimal partition ( a) il cui costo di calcolo ( )| is minimal. T V E k V E P V V1 V2 Vn Vi k i Vi Vi Vi v Vi c c v VJ j i P* V P * V Here subset consists of all nodes that will be computed at -th step (Fig. ), so | | ≤ indicates that we can compute nodes each step at most because the number of available threads is . The restriction “for each node ∈ Tutti i suoi bambini noduli | ∈children( )} must in a previous subset , where 1 ≤ < ” indicates that node can be processed only if all its child nodes are processed. Vi i 2e Vi k k k v Vi c c v Vj j i v Implementazione di DHS We aim to find an optimal way to parallelize the computation of solving linear equations for each neuron model by solving the mathematical scheduling problem above. To get the optimal partition, DHS first analyzes the topology and calculates the depth ( ) for all nodes ∈ Quindi, i seguenti due passaggi saranno eseguiti iterativamente fino a quando ogni nodo ∈ is assigned to a subset: (1) find all candidate nodes and put these nodes into candidate set . A node is a candidate only if all its child nodes have been processed or it does not have any child nodes. (2) if | | ≤ , i.e., the number of candidate nodes is smaller or equivalent to the number of available threads, remove all nodes in e metterli in , otherwise, remove I nodi più profondi di and add them to subset . Label these nodes as processed nodes (Fig. ). After filling in subset , go to step (1) to fill in the next subset . d v v V v V Q Q k Q V*i k Q Vi 2d Vi Vi+1 Correctness proof for DHS After applying DHS to a neural tree = { , }, we get a partition ( di = { , , … }, | | ≤ 1 ≤ ≤ Nodi nello stesso sottoinsieme will be computed in parallel, taking steps to perform triangularization and back-substitution, respectively. We then demonstrate that the reordering of the computation in DHS will result in a result identical to the serial Hines method. T V E P V V1 V2 Vn Vi k i n Vi n La divisione ( ) obtained from DHS decides the computation order of all nodes in a neural tree. Below we demonstrate that the computation order determined by ( ) satisfies the correctness conditions. ( ) is obtained from the given neural tree . Operations in DHS do not modify the tree topology and values of tree nodes (corresponding values in the linear equations), so the tree morphology and initial values of all nodes are not changed, which satisfies condition 1: the tree morphology and initial values of all nodes are identical to those in serial Hines method. In triangularization, nodes are processed from subset to . As shown in the implementation of DHS, all nodes in subset are selected from the candidate set , and a node can be put into solo se tutti i suoi nodi sono stati elaborati. Così il bambino nodi di tutti i nodi in are in { , , … }, meaning that a node is only computed after all its children have been processed, which satisfies condition 2: in triangularization, a node can be processed if and only if all its child nodes are already processed. In back-substitution, the computation order is the opposite of that in triangularization, i.e., from to . As shown before, the child nodes of all nodes in are in { , , … }, so parent nodes of nodes in are in { , , … }, which satisfies condition 3: in back-substitution, a node can be processed only if its parent node is already processed. P V P V P V T V1 di Vn Vi Q Q Vi V1 V2 di Vi-1 Vn V1 Vi V1 V2 Vi-1 Vi Vi+1 Vi+2 Vn Optimality proof for DHS The idea of the proof is that if there is another optimal solution, it can be transformed into our DHS solution without increasing the number of steps the algorithm requires, thus indicating that the DHS solution is optimal. For each subset in ( ), DHS moves (thread number) deepest nodes from the corresponding candidate set due . If the number of nodes in is smaller than , move all nodes from to . To simplify, we introduce , indicating the depth sum of deepest nodes in . All subsets in ( ) satisfy the max-depth criteria (Supplementary Fig. ): . We then prove that selecting the deepest nodes in each iteration makes an optimal partition. If there exists an optimal partition = { , , … } containing subsets that do not satisfy the max-depth criteria, we can modify the subsets in ( ) so that all subsets consist of the deepest nodes from and the number of subsets ( | ( )|) remain the same after modification. Vi P V k Qi Vi Qi k Qi Vi Di k Qi P V 6a P(V) P*(V) V*1 V*2 V*s P* V Q P* V Without any loss of generalization, we start from the first subset not satisfying the criteria, i.e., . There are two possible cases that will make not satisfy the max-depth criteria: (1) | | < and there exist some valid nodes in that are not put to ; (2) | | = but nodes in Non sono le deepest nodes in . V*i V * I V*i k Qi V*i V*i k V*i k Qi For case (1), because some candidate nodes are not put to , these nodes must be in the subsequent subsets. As | | , possiamo spostare i nodi corrispondenti dai sottosetti successivi a che non aumenterà il numero dei sottosetti e renderà Conoscere i criteri (Fig. , top). For case (2), | | = , these deeper nodes that are not moved from the candidate set into must be added to subsequent subsets (Supplementary Fig. , bottom). These deeper nodes can be moved from subsequent subsets to through the following method. Assume that after filling , is picked and one of the I nodi più profondi is still in , thus will be put into a subsequent subset ( > ). We first move da to + , then modify subset + di as follows: if | + | ≤ and none of the nodes in + is the parent of node , smettere di modificare l'ultimo sottoinsieme. Altrimenti, modificare + as follows (Supplementary Fig. ): if the parent node of è in + , move this parent node to + ; else move the node with minimum depth from + due + . After adjusting , modify subsequent subsets + , + , … with the same strategy. Finally, move from to . V*i V * I < k V*i V*i 6b V*i k Qi V*i 6b V * I V * I v k v’ Qi v’ V*j j i v V*i V*i 1 V * I 1 V*i 1 k V * I 1 v V*i 1 6c v V*i 1 V*i 2 V*i 1 V*i 2 V*i V*i 1 V*i 2 V*j-1 v’ V * J V*i With the modification strategy described above, we can replace all shallower nodes in with the Il nodo più profondo in e mantenere il numero di sottoinsiemi, vale a dire, ( )| the same after modification. We can modify the nodes with the same strategy for all subsets in ( ) that do not contain the deepest nodes. Finally, all subsets ∈ ( ) can satisfy the max-depth criteria, and | ( )| does not change after modifying. V*i k Qi P* V P* V V*i P * V P * V In conclusion, DHS generates a partition ( ), and all subsets ∈ ( ) satisfy the max-depth condition: . For any other optimal partition ( ) we can modify its subsets to make its structure the same as ( ), i.e., each subset consists of the deepest nodes in the candidate set, and keep | ( ) the same after modification. So, the partition ( ) obtained from DHS is one of the optimal partitions. P V Vi P V P* V P V P* V | P V Implementazione GPU e miglioramento della memoria To achieve high memory throughput, GPU utilizes the memory hierarchy of (1) global memory, (2) cache, (3) register, where global memory has large capacity but low throughput, while registers have low capacity but high throughput. We aim to boost memory throughput by leveraging the memory hierarchy of GPU. La GPU utilizza l'architettura SIMT (Single-Instruction, Multiple-Thread). Warps sono le unità di pianificazione di base sulla GPU (una warp è un gruppo di 32 fili paralleli). Una warp esegue la stessa istruzione con dati diversi per fili diversi L'ordine corretto dei nodi è essenziale per questo batch di calcolo in warps, per assicurarsi che DHS ottieni risultati identici come il metodo Hines serie. Quando si implementa DHS su GPU, prima raggruppiamo tutte le cellule in più warps in base alle loro morfologie. Le cellule con morfologie simili sono raggruppate nella stessa warp. Applicamo quindi DHS su tutti i neuroni, assegnando i comparti di ciascun neurone a più fili. Poiché i neuroni sono raggruppati in warps, i filamenti per lo stesso neurone sono nello stesso warp. Pertanto, la sincronizzazione intrinseca in warps mantiene l'ordine di calcolo coerente con la dipendenza dei dati del metodo Hines serie. Infine, i filamenti in ogni warp sono allineati e 46 When a warp loads pre-aligned and successively-stored data from global memory, it can make full use of the cache, which leads to high memory throughput, while accessing scatter-stored data would reduce memory throughput. After compartments assignment and threads rearrangement, we permute data in global memory to make it consistent with computing orders so that warps can load successively-stored data when executing the program. Moreover, we put those necessary temporary variables into registers rather than global memory. Registers have the highest memory throughput, so the use of registers further accelerates DHS. Full-spine and few-spine biophysical models We used the published human pyramidal neuron La capacità della membrana m = 0.44 μF cm-2, membrane resistance m = 48,300 Ω cm2, and axial resistivity a = 261.97 Ω cm. In this model, all dendrites were modeled as passive cables while somas were active. The leak reversal potential l = -83,1 mV. I canali ionici come Na+ e K+ sono stati inseriti sull'asse soma e iniziale, e i loro potenziali di inversione sono stati Na = 67,6 mV K = -102 mV respectively. All these specific parameters were set the same as in the model of Eyal, et al. , for more details please refer to the published model (ModelDB, access No. 238347). 51 c r r E E E 51 Nel modello a poche vertebre, la capacità della membrana e la conduttività massima di perdita dei cavi dendritici a 60 μm dalla soma sono stati moltiplicati per un L’applicazione di questo modello di spina dendritica, spine was set to 1.9. Only the spines that receive synaptic inputs were explicitly attached to dendrites. F F Nel modello della colonna vertebrale, tutte le colonne vertebrali erano esplicitamente attaccate alle dendrites. Abbiamo calcolato la densità della colonna vertebrale con il neurone ricostruito in Eyal, et al. . The spine density was set to 1.3 μm-1, and each cell contained 24994 spines on dendrites 60 μm away from the soma. 51 The morphologies and biophysical mechanisms of spines were the same in few-spine and full-spine models. The length of the spine neck neck = 1.35 μm and the diameter neck = 0.25 μm, whereas the length and diameter of the spine head were 0.944 μm, i.e., the spine head area was set to 2.8 μm2. Both spine neck and spine head were modeled as passive cables, with the reversal potential = -86 mV. The specific membrane capacitance, membrane resistance, and axial resistivity were the same as those for dendrites. L D El Synaptic inputs We investigated neuronal excitability for both distributed and clustered synaptic inputs. All activated synapses were attached to the terminal of the spine head. For distributed inputs, all activated synapses were randomly distributed on all dendrites. For clustered inputs, each cluster consisted of 20 activated synapses that were uniformly distributed on a single randomly-selected compartment. All synapses were activated simultaneously during the simulation. Le correnti sinaptiche basate su AMPA e NMDA sono state simulate come nel lavoro di Eyal et al. La conduttività AMPA è stata modellata come funzione a doppio esponenziale e la conduttività NMDA come funzione a doppio esponenziale dipendente dalla tensione. rise and decay were set to 0.3 and 1.8 ms. For the NMDA model, alzare e decay were set to 8.019 and 34.9884 ms, respectively. The maximum conductance of AMPA and NMDA were 0.73 nS and 1.31 nS. τ τ τ τ Background noise We attached background noise to each cell to simulate a more realistic environment. Noise patterns were implemented as Poisson spike trains with a constant rate of 1.0 Hz. Each pattern started at start = 10 ms and lasted until the end of the simulation. We generated 400 noise spike trains for each cell and attached them to randomly-selected synapses. The model and specific parameters of synaptic currents were the same as described in , tranne che la conduttività massima di NMDA è stata uniformemente distribuita da 1,57 a 3,275, con conseguente un rapporto AMPA-NMDA più elevato. t Synaptic Inputs Exploring neuronal excitability We investigated the spike probability when multiple synapses were activated simultaneously. For distributed inputs, we tested 14 cases, from 0 to 240 activated synapses. For clustered inputs, we tested 9 cases in total, activating from 0 to 12 clusters respectively. Each cluster consisted of 20 synapses. For each case in both distributed and clustered inputs, we calculated the spike probability with 50 random samples. Spike probability was defined as the ratio of the number of neurons fired to the total number of samples. All 1150 samples were simulated simultaneously on our DeepDendrite platform, reducing the simulation time from days to minutes. Performing AI tasks with the DeepDendrite platform I tradizionali simulatori di neuroni dettagliati mancano di due funzionalità importanti per i compiti di intelligenza artificiale moderna: (1) eseguire a turno simulazioni e aggiornamenti di peso senza una pesante reinitializzazione e (2) elaborare contemporaneamente campioni di stimoli multipli in modo batch-like. DeepDendrite si compone di tre moduli (Fig. ): (1) an I/O module; (2) a DHS-based simulating module; (3) a learning module. When training a biophysically detailed model to perform learning tasks, users first define the learning rule, then feed all training samples to the detailed model for learning. In each step during training, the I/O module picks a specific stimulus and its corresponding teacher signal (if necessary) from all training samples and attaches the stimulus to the network model. Then, the DHS-based simulating module initializes the model and starts the simulation. After simulation, the learning module updates all synaptic weights according to the difference between model responses and teacher signals. After training, the learned model can achieve performance comparable to ANN. The testing phase is similar to training, except that all synaptic weights are fixed. 5 HPC-Net model Image classification is a typical task in the field of AI. In this task, a model should learn to recognize the content in a given image and output the corresponding label. Here we present the HPC-Net, a network consisting of detailed human pyramidal neuron models that can learn to perform image classification tasks by utilizing the DeepDendrite platform. HPC-Net has three layers, i.e., an input layer, a hidden layer, and an output layer. The neurons in the input layer receive spike trains converted from images as their input. Hidden layer neurons receive the output of input layer neurons and deliver responses to neurons in the output layer. The responses of the output layer neurons are taken as the final output of HPC-Net. Neurons between adjacent layers are fully connected. For each image stimulus, we first convert each normalized pixel to a homogeneous spike train. For pixel with coordinates ( ) in the image, the corresponding spike train has a constant interspike interval ISI( ) (in ms) which is determined by the pixel value ( ) as shown in Eq. ( ). x, y τ x, y p x, y 1 In our experiment, the simulation for each stimulus lasted 50 ms. All spike trains started at 9 + ISI ms and lasted until the end of the simulation. Then we attached all spike trains to the input layer neurons in a one-to-one manner. The synaptic current triggered by the spike arriving at time è dato da τ t0 where is the post-synaptic voltage, the reversal potential syn = 1 mV, the maximum synaptic conductance max = 0.05 μS, and the time constant = 0.5 ms. v E g τ Neurons in the input layer were modeled with a passive single-compartment model. The specific parameters were set as follows: membrane capacitance m = 1.0 μF cm-2, membrane resistance m = 104 Ω cm2, axial resistivity a = 100 Ω cm, reversal potential of passive compartment l = 0 mV. c r r E The hidden layer contains a group of human pyramidal neuron models, receiving the somatic voltages of input layer neurons. The morphology was from Eyal, et al. , and all neurons were modeled with passive cables. The specific membrane capacitance m = 1.5 μF cm-2, membrane resistance m = 48,300 Ω cm2, axial resistivity a = 261.97 Ω cm, and the reversal potential of all passive cables l = 0 mV. Input neurons could make multiple connections to randomly-selected locations on the dendrites of hidden neurons. The synaptic current activated by the -th synapse of the -th input neuron on neuron ’s dendrite is defined as in Eq. ( ), where is the synaptic conductance, is the synaptic weight, is the ReLU-like somatic activation function, and is the somatic voltage of the Il neurone d'ingresso nel tempo . 51 c r r E k i j 4 gijk Wijk i t I neuroni nello strato di uscita sono stati anche modellati con un modello passivo di singolo reparto, e ogni neurone nascosto ha fatto solo una connessione sinaptica a ciascun neurone di uscita. Tutti i parametri specifici sono stati impostati come quelli dei neuroni di input. e) il 4 Image classification with HPC-Net Per ogni stimolo di immagine di input, abbiamo prima normalizzato tutti i valori dei pixel a 0.0-1.0. Poi abbiamo convertito i pixel normalizzati in treni di punta e li abbiamo attaccati ai neuroni di input. Le tensioni somatiche dei neuroni di uscita vengono utilizzate per calcolare la probabilità predetta di ogni classe, come mostrato nell'equazione , where is the probability of -th class predicted by the HPC-Net, is the average somatic voltage from 20 ms to 50 ms of the - il neurone di uscita, e indicates the number of classes, which equals the number of output neurons. The class with the maximum predicted probability is the final classification result. In this paper, we built the HPC-Net with 784 input neurons, 64 hidden neurons, and 10 output neurons. 6 Pi i i C Synaptic plasticity rules for HPC-Net ispirato ai lavori precedenti L'applicazione di un sistema di classificazione dell'immagine è basata su una regola di apprendimento basata su gradienti per formare il nostro HPC-Net per eseguire il compito di classificazione dell'immagine. ( ), where is the predicted probability for class , indicates the actual class the stimulus image belongs to, = 1 if input image belongs to class , and = 0 if not. 36 7 pi i yi yi i yi When training HPC-Net, we compute the update for weight (the synaptic weight of the -th synapse connecting neuron to neuron ) at each time step. After the simulation of each image stimulus, is updated as shown in Eq. ( ): Wijk k i j Wijk 8 Here is the learning rate, is the update value at time , , are somatic voltages of neuron and rispettivamente È il -th synaptic current activated by neuron on neuron , its synaptic conductance, is the transfer resistance between the -th connected compartment of neuron on neuron ’s dendrite to neuron ’s soma, s = 30 ms, e = 50 ms are start time and end time for learning respectively. For output neurons, the error term can be computed as shown in Eq. ( ). For hidden neurons, the error term is calculated from the error terms in the output layer, given in Eq. ( ). t vj vi i j di Iijk k i j Giaccio rijk k i j j t t 10 11 Since all output neurons are single-compartment, equals to the input resistance of the corresponding compartment, . Transfer and input resistances are computed by NEURON. Mini-batch training is a typical method in deep learning for achieving higher prediction accuracy and accelerating convergence. DeepDendrite also supports mini-batch training. When training HPC-Net with mini-batch size batch, we make batch copies of HPC-Net. During training, each copy is fed with a different training sample from the batch. DeepDendrite first computes the weight update for each copy separately. After all copies in the current training batch are done, the average weight update is calculated and weights in all copies are updated by this same amount. N N Robustness against adversarial attack with HPC-Net To demonstrate the robustness of HPC-Net, we tested its prediction accuracy on adversarial samples and compared it with an analogous ANN (one with the same 784-64-10 structure and ReLU activation, for fair comparison in our HPC-Net each input neuron only made one synaptic connection to each hidden neuron). We first trained HPC-Net and ANN with the original training set (original clean images). Then we added adversarial noise to the test set and measured their prediction accuracy on the noisy test set. We used the Foolbox , to generate adversarial noise with the FGSM method . ANN was trained with PyTorch , and HPC-Net was trained with our DeepDendrite. For fairness, we generated adversarial noise on a significantly different network model, a 20-layer ResNet . The noise level ranged from 0.02 to 0.2. We experimented on two typical datasets, MNIST and Fashion-MNIST . Results show that the prediction accuracy of HPC-Net is 19% and 16.72% higher than that of the analogous ANN, respectively. 98 99 93 100 101 95 96 Reporting summary Further information on research design is available in the linked to this article. Nature Portfolio Reporting Summary Data availability The data that support the findings of this study are available within the paper, Supplementary Information and Source Data files provided with this paper. The source code and data that used to reproduce the results in Figs. – are available at . The MNIST dataset is publicly available at . The Fashion-MNIST dataset is publicly available at . sono forniti con questo documento. 3 6 https://github.com/pkuzyc/DeepDendrite http://yann.lecun.com/exdb/mnist https://github.com/zalandoresearch/fashion-mnist Source data Code availability The source code of DeepDendrite as well as the models and code used to reproduce Figs. – in this study are available at . 3 6 https://github.com/pkuzyc/DeepDendrite References McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. , 115–133 (1943). Bull. Math. Biophys. 5 LeCun, Y., Bengio, Y. e Hinton, G. Deep learning. Natura 521, 436–444 (2015). Poirazi, P., Brannon, T. & Mel, B. W. Aritmetica della somma sinaptica sottostante in una cellula piramidale modello CA1. Neuron 37, 977–987 (2003). London, M. & Häusser, M. Dendritic computation. , 503–532 (2005). Annu. Rev. Neurosci. 28 Branco, T. & Häusser, M. The single dendritic branch as a fundamental functional unit in the nervous system. , 494–502 (2010). Curr. Opin. Neurobiol. 20 Stuart, G. J. & Spruston, N. Dendritic integration: 60 years of progress. , 1713–1721 (2015). Nat. Neurosci. 18 Poirazi, P. e Papoutsi, A. Illuminando la funzione dendritica con modelli computazionali. Nat. Rev. Neurosci. 21, 303-321 (2020). Yuste, R. & Denk, W. Dendritic spines as basic functional units of neuronal integration. , 682–684 (1995). Nature 375 Engert, F. & Bonhoeffer, T. Dendritic spine changes associated with hippocampal long-term synaptic plasticity. , 66–70 (1999). Nature 399 Yuste, R. Spine dendritiche e circuiti distribuiti. Neuron 71, 772–781 (2011). Yuste, R. La compartimentalizzazione elettrica nelle spine dendritiche. Annu. Rev. Neurosci. 36, 429–449 (2013). Rall, W. Ramificazione degli alberi dendritici e resistività della membrana dei motoneuroni. Exp. Neurol. 1, 491-527 (1959). Segev, I. & Rall, W. Studio computazionale di una colonna vertebrale dendritica eccitabile. J. Neurophysiol. 60, 499-523 (1988). Silver, D. et al. Mastering the game of go with deep neural networks and tree search. , 484–489 (2016). Nature 529 Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. , 1140–1144 (2018). Science 362 McCloskey, M. & Cohen, N. J. Catastrophic interference in connectionist networks: the sequential learning problem. , 109–165 (1989). Psychol. Learn. Motiv. 24 French, R. M. Catastrophic forgetting in connectionist networks. , 128–135 (1999). Trends Cogn. Sci. 3 Naud, R. & Sprekeler, H. Sparse bursts optimize information transmission in a multiplexed neural code. , E6329–E6338 (2018). Proc. Natl Acad. Sci. USA 115 Sacramento, J., Costa, R. P., Bengio, Y. & Senn, W. Dendritic cortical microcircuits approximate the backpropagation algorithm. in (NeurIPS*,* 2018). Advances in Neural Information Processing Systems 31 (NeurIPS 2018) Payeur, A., Guerguiev, J., Zenke, F., Richards, B. A. & Naud, R. La plasticità sinaptica dipendente da Burst può coordinare l'apprendimento in circuiti gerarchici. Bicknell, B. A. & Häusser, M. Una regola di apprendimento sinaptico per sfruttare il calcolo dendritico non lineare. Neuron 109, 4001–4017 (2021). Moldwin, T., Kalmenson, M. & Segev, I. The gradient clusteron: a model neuron that learns to solve classification tasks via dendritic nonlinearities, structural plasticity, and gradient descent. , e1009015 (2021). PLoS Comput. Biol. 17 Hodgkin, A. L. & Huxley, A. F. A quantitative description of membrane current and Its application to conduction and excitation in nerve. , 500–544 (1952). J. Physiol. 117 Rall, W. Theory of physiological properties of dendrites. , 1071–1092 (1962). Ann. N. Y. Acad. Sci. 96 Hines, M. L. & Carnevale, N. T. The NEURON simulation environment. , 1179–1209 (1997). Neural Comput. 9 Bower, J. M. & Beeman, D. in Il libro della Genesi: esplorando i modelli neurali realistici con il sistema di simulazione neurale generale (eds Bower, J. M. & Beeman, D.) 17–27 (Springer New York, 1998). Hines, M. L., Eichner, H. & Schürmann, F. La divisione neuronale nelle simulazioni di rete parallela computazionale consente la scalabilità del tempo di esecuzione con il doppio del numero di processori. Hines, M. L., Markram, H. & Schürmann, F. Fully implicit parallel simulation of single neurons. , 439–448 (2008). J. Comput. Neurosci. 25 Ben-Shalom, R., Liberman, G. & Korngreen, A. Accelerating compartmental modeling on a graphical processing unit. , 4 (2013). Front. Neuroinform. 7 Tsuyuki, T., Yamamoto, Y. & Yamazaki, T. Efficient numerical simulation of neuron models with spatial structure on graphics processing units. In (eds Hirose894Akiraet al.) 279–285 (Springer International Publishing, 2016). Proc. 2016 International Conference on Neural Information Processing Vooturi, D. T., Kothapalli, K. & Bhalla, U. S. Parallelizing Hines Matrix Solver in Neuron Simulations on GPU. In 388–397 (IEEE, 2017). Proc. IEEE 24th International Conference on High Performance Computing (HiPC) Huber, F. Efficiente solver albero per matrici hines sulla GPU. Preprint a https://arxiv.org/abs/1810.12742 (2018). Korte, B. & Vygen, J. 6 edn (Springer, 2018). Combinatorial Optimization Theory and Algorithms Gebali, F. Algoritmi e calcolo parallelo (Wiley, 2011) Kumbhar, P. et al. CoreNEURON: Un motore di calcolo ottimizzato per il simulatore NEURON. Front. Neuroinform. 13, 63 (2019). Urbanczik, R. & Senn, W. Learning by the dendritic prediction of somatic spiking. , 521–528 (2014). Neuron 81 Ben-Shalom, R., Aviv, A., Razon, B. & Korngreen, A. Optimizing ion channel models using a parallel genetic algorithm on graphical processors. , 183–194 (2012). J. Neurosci. Methods 206 Mascagni, M. A parallelizing algorithm for computing solutions to arbitrarily branched cable neuron models. , 105–114 (1991). J. Neurosci. Methods 36 McDougal, R. A. et al. Twenty years of modelDB and beyond: building essential modeling tools for the future of neuroscience. , 1–10 (2017). J. Comput. Neurosci. 42 Migliore, M., Messineo, L. & Ferrante, M. Dendritic Ih blocca selettivamente la somma temporale di input distali non sincronizzati nei neuroni piramidali CA1. Hemond, P. et al. Distinct classes of pyramidal cells exhibit mutually exclusive firing patterns in hippocampal area CA3b. , 411–424 (2008). Hippocampus 18 Hay, E., Hill, S., Schürmann, F., Markram, H. & Segev, I. Modelli di cellule piramidali dello strato neocortico 5b che catturano una vasta gamma di proprietà attive dendritiche e perisomatiche. PLoS Comput. Biol. 7, e1002107 (2011). Masoli, S., Solinas, S. & D’Angelo, E. Action potential processing in a detailed purkinje cell model reveals a critical role for axonal compartmentalization. , 47 (2015). Front. Cell. Neurosci. 9 Lindroos, R. et al. Basal ganglia neuromodulation over multiple temporal and structural scales—simulations of direct pathway MSNs investigate the fast onset of dopaminergic effects and predict the role of Kv4.2. , 3 (2018). Front. Neural Circuits 12 Migliore, M. et al. Synaptic clusters function as odor operators in the olfactory bulb. , 8499–8504 (2015). Proc. Natl Acad. Sci. USa 112 NVIDIA. . (2021). CUDA C++ Programming Guide https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html NVIDIA. . (2021). CUDA C++ Best Practices Guide https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html Harnett, M. T., Makara, J. K., Spruston, N., Kath, W. L. & Magee, J. C. Synaptic amplification by dendritic spines enhances input cooperativity. , 599–602 (2012). Nature 491 Chiu, C. Q. et al. Compartmentalization of GABAergic inhibition by dendritic spines. , 759–762 (2013). Science 340 Tønnesen, J., Katona, G., Rózsa, B. & Nägerl, U. V. Spine neck plasticity regulates compartmentalization of synapses. , 678–685 (2014). Nat. Neurosci. 17 Eyal, G. et al. Human cortical pyramidal neurons: from spines to spikes via models. , 181 (2018). Front. Cell. Neurosci. 12 Koch, C. & Zador, A. La funzione delle spine dendritiche: dispositivi che servono la compartimentalizzazione biochimica piuttosto che elettrica. Koch, C. Spine dendritico. in Biophysics of Computation (Oxford University Press, 1999). Rapp, M., Yarom, Y. & Segev, I. L'impatto dell'attività di sfondo della fibra parallela sulle proprietà dei cavi delle cellule purkinje cerebrali. Hines, M. Calcolo efficiente delle equazioni nervose ramificate. Int. J. Bio-Med. Comput. 15, 69–76 (1984). Nayebi, A. & Ganguli, S. Protezione biologicamente ispirata delle reti profonde dagli attacchi avversari. Preprint a https://arxiv.org/abs/1703.09202 (2017). Goddard, N. H. & Hood, G. Simulazione su larga scala utilizzando la Genesi parallela. in Il libro della Genesi: esplorare i modelli neurali realistici con il sistema di simulazione neurale generale (eds Bower James M. & Beeman David) 349-379 (Springer New York, 1998). Migliore, M., Cannia, C., Lytton, W. W., Markram, H. & Hines, M. L. Simulazioni di rete parallele con NEURON. Lytton, W. W. et al. Neurotecnologie di simulazione per l'avanzamento della ricerca cerebrale: parallelizzazione di grandi reti in NEURON. Valero-Lara, P. et al. cuHinesBatch: Solving multiple Hines systems on GPUs human brain project. In 566–575 (IEEE, 2017). Proc. 2017 International Conference on Computational Science Akar, N. A. et al. Arbor—A morphologically-detailed neural network simulation library for contemporary high-performance computing architectures. In 274–282 (IEEE, 2019). Proc. 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) Ben-Shalom, R. et al. NeuroGPU: Accelerating multi-compartment, biophysically detailed neuron simulations on GPUs. , 109400 (2022). J. Neurosci. Methods 366 Rempe, M. J. & Chopp, D. L. A predictor-corrector algorithm for reaction-diffusion equations associated with neural activity on branched structures. , 2139–2161 (2006). SIAM J. Sci. Comput. 28 Kozloski, J. & Wagner, J. Una soluzione ultrascalabile per la simulazione del tessuto neurale su larga scala. Front. Neuroinform. 5, 15 (2011). Jayant, K. et al. Registrazioni di tensione intracellulare mirate da spine dendritiche utilizzando nanopipette rivestite con punti quantistici. Nat. Nanotechnol. 12, 335-342 (2017). Palmer, L. M. & Stuart, G. J. Membrane potential changes in dendritic spines during action potentials and synaptic input. , 6897–6903 (2009). J. Neurosci. 29 Nishiyama, J. e Yasuda, R. Calcolo biochimico per la plasticità strutturale della colonna vertebrale. Neuron 87, 63–75 (2015). Yuste, R. & Bonhoeffer, T. Morphological changes in dendritic spines associated with long-term synaptic plasticity. , 1071–1089 (2001). Annu. Rev. Neurosci. 24 Holtmaat, A. & Svoboda, K. Experience-dependent structural synaptic plasticity in the mammalian brain. , 647–658 (2009). Nat. Rev. Neurosci. 10 Caroni, P., Donato, F. & Muller, D. Structural plasticity upon learning: regulation and functions. , 478–490 (2012). Nat. Rev. Neurosci. 13 Keck, T. et al. Massive restructuring of neuronal circuits during functional reorganization of adult visual cortex. , 1162 (2008). Nat. Neurosci. 11 Hofer, S. B., Mrsic-Flogel, T. D., Bonhoeffer, T. & Hübener, M. Experience leaves a lasting structural trace in cortical circuits. , 313–317 (2009). Nature 457 Trachtenberg, J. T. et al. Long-term in vivo imaging of experience-dependent synaptic plasticity in adult cortex. , 788–794 (2002). Nature 420 Marik, S. A., Yamahachi, H., McManus, J. N., Szabo, G. & Gilbert, C. D. Axonal dynamics of excitatory and inhibitory neurons in somatosensory cortex. , e1000395 (2010). PLoS Biol. 8 Xu, T. et al. Formazione rapida e stabilizzazione selettiva delle sinapsi per ricordi motori duraturi. Natura 462, 915-919 (2009). Albarran, E., Raissi, A., Jáidar, O., Shatz, C. J. & Ding, J. B. Enhancing motor learning by increasing the stability of newly formed dendritic spines in the motor cortex. , 3298–3311 (2021). Neuron 109 Branco, T. & Häusser, M. Synaptic integration gradients in single cortical pyramidal cell dendrites. , 885–892 (2011). Neuron 69 Major, G., Larkum, M. E. & Schiller, J. Proprietà attive dei dendrites neuronali piramidali neocorticali. Annu. Rev. Neurosci. 36, 1–24 (2013). Gidon, A. et al. Dendritic action potentials and computation in human layer 2/3 cortical neurons. , 83–87 (2020). Science 367 Doron, M., Chindemi, G., Muller, E., Markram, H. & Segev, I. Inibizione sinaptica temporizzata forma picchi NMDA, influenzando il trattamento dendritico locale e le proprietà globali di I/O dei neuroni corticali. Du, K. et al. Cell-type-specific inhibition of the dendritic plateau potential in striatal spiny projection neurons. , E7612–E7621 (2017). Proc. Natl Acad. Sci. USA 114 Smith, S. L., Smith, I. T., Branco, T. & Häusser, M. Dendritic spikes enhance stimulus selectivity in cortical neurons in vivo. , 115–120 (2013). Nature 503 Xu, N.-l et al. Nonlinear dendritic integration of sensory and motor input during an active sensing task. , 247–251 (2012). Nature 492 Takahashi, N., Oertner, T. G., Hegemann, P. & Larkum, M. E. Active cortical dendrites modulate perception. , 1587–1590 (2016). Science 354 Sheffield, M. E. & Dombeck, D. A. Calcium transient prevalence across the dendritic arbour predicts place field properties. , 200–204 (2015). Nature 517 Markram, H. et al. Reconstruction and simulation of neocortical microcircuitry. , 456–492 (2015). Cell 163 Billeh, Y. N. et al. Systematic integration of structural and functional data into multi-scale models of mouse primary visual cortex. , 388–403 (2020). Neuron 106 Hjorth, J. et al. The microcircuits of striatum in silico. , 202000671 (2020). Proc. Natl Acad. Sci. USA 117 Guerguiev, J., Lillicrap, T. P. & Richards, B. A. Towards deep learning with segregated dendrites. , e22901 (2017). elife 6 Iyer, A. et al. Avoiding catastrophe: active dendrites enable multi-task learning in dynamic environments. , 846219 (2022). Front. Neurorobot. 16 Jones, I. S. & Kording, K. P. Might a single neuron solve interesting machine learning problems through successive computations on its dendritic tree? , 1554–1571 (2021). Neural Comput. 33 Bird, A.D., Jedlicka, P. & Cuntz, H. La normalizzazione dendritica migliora l'apprendimento nelle reti neurali artificiali scarsamente connesse. Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. In (ICLR, 2015). 3rd International Conference on Learning Representations (ICLR) Papernot, N., McDaniel, P. & Goodfellow, I. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. Preprint at (2016). https://arxiv.org/abs/1605.07277 Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. , 2278–2324 (1998). Proc. IEEE 86 Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: un nuovo dataset di immagini per il benchmarking degli algoritmi di apprendimento automatico. Preprint a http://arxiv.org/abs/1708.07747 (2017). Bartunov, S. et al. Assessing the scalability of biologically-motivated deep learning algorithms and architectures. In (NeurIPS, 2018). Advances in Neural Information Processing Systems 31 (NeurIPS 2018) Rauber, J., Brendel, W. & Bethge, M. Foolbox: Una scatola di strumenti Python per confrontare la robustezza dei modelli di machine learning. in Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning (2017). Rauber, J., Zimmermann, R., Bethge, M. & Brendel, W. Foolbox native: attacchi avversariali veloci per misurare la robustezza dei modelli di apprendimento automatico in PyTorch, TensorFlow e JAX. Paszke, A. et al. PyTorch: uno stile imperativo, biblioteca di apprendimento profondo ad alte prestazioni. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (NeurIPS, 2019). He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 770–778 (IEEE, 2016). Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Acknowledgements The authors sincerely thank Dr. Rita Zhang, Daochen Shi and members at NVIDIA for the valuable technical support of GPU computing. This work was supported by the National Key R&D Program of China (No. 2020AAA0130400) to K.D. and T.H., National Natural Science Foundation of China (No. 61088102) to T.H., National Key R&D Program of China (No. 2022ZD01163005) to L.M., Key Area R&D Program of Guangdong Province (No. 2018B030338001) to T.H., National Natural Science Foundation of China (No. 61825101) to Y.T., Swedish Research Council (VR-M-2020-01652), Swedish e-Science Research Centre (SeRC), EU/Horizon 2020 No. 945539 (HBP SGA3), and KTH Digital Futures to J.H.K., J.H., and A.K., Swedish Research Council (VR-M-2021-01995) and EU/Horizon 2020 no. 945539 (HBP SGA3) to S.G. and A.K. Part of the simulations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at PDC KTH partially funded by the Swedish Research Council through grant agreement no. 2018-05973. This paper is under CC by 4.0 Deed (Attribution 4.0 International) license. available on nature This paper is CC by 4.0 Deed (Attribuzione 4.0 Internazionale) available on nature