科学家建造了一个GPU引擎,模拟大脑细胞1500倍更快

作者 : 他 李马 刘晓波 J. J. 约翰内斯·赫尔思 Alexander Kozlov 犹太人他 深圳 张 Jeanette Hellgren Kotaleski 江南天 斯坦·格里尔纳 Kai Du Tiejun Huang 作者 : 他 李马 刘晓波 J. J. 约翰内斯·赫尔思 亚历山大·科斯洛夫 犹太人他 深圳 张 珍妮特·赫尔格伦·科塔莱斯基 江南天 斯坦·格里尔纳 当你 抽象 生物物理详细的多部位模型是探索大脑的计算原理的强大工具,也作为为人工智能(AI)系统生成算法的理论框架。然而,昂贵的计算成本严重限制了神经科学和AI领域的应用。 安德里克 伊拉克 我们理论上证明 DHS 实现是计算最佳和准确的。这个基于 GPU 的方法以 2-3 个大小速度高于传统 CPU 平台中的经典串行 Hines 方法。我们构建了一个 DeepDendrite 框架,该框架整合了 DHS 方法和 NEURON 模拟器的 GPU 计算引擎,并展示了 DeepDendrite 在神经科学任务中的应用。我们研究了脊椎输入的空间模式如何影响神经元的兴奋性,在一个详细的人类金字塔神经元模型中具有 25,000 个脊柱。 D H S 介绍 解读神经元的编码和计算原理对于神经科学至关重要,哺乳动物的大脑由数以千计的不同类型的神经元组成,具有独特的形态和生物物理特性。 ,其中神经元被视为简单的总结单位,在神经计算中仍然广泛应用,特别是神经网络分析中。近年来,现代人工智能(AI)利用了这一原则,开发出强大的工具,如人工神经网络(ANN)。 然而,除了在单个神经元级别进行综合计算之外,细胞级别的分区,如神经元,也可以作为独立的计算单位进行非线性操作。 , , , , 此外,脊椎神经元中密集地覆盖脊椎神经元的微小突出,可以分区合成信号,允许它们与其父母的丹德里特体ex vivo和in vivo分离。 , , , . 1 2 3 4 5 6 7 8 9 10 11 使用生物细节神经元的模拟提供了将生物细节与计算原理联系起来的理论框架。 , 允许我们模型神经元具有现实的形态,内在的离子导电和外在的 synaptic输入。 ,该模型模型的生物物理膜属性丹德里特作为被动电缆,提供一个数学描述的电子信号如何入侵和传播整个复杂的神经元过程。通过整合电缆理论与活跃的生物物理机制,如离子通道,刺激和抑制的 synaptic 电流等,一个详细的多部件模型可以实现细胞和细胞下神经计算超越实验的局限性 , . 12 13 12 4 7 除了对神经科学的深刻影响外,最近生物细节神经元模型也被用来弥合神经元结构和生物物理细节与人工智能之间的差距。现代人工智能领域的主要技术是由点神经元组成的ANN,这类似于生物神经网络。 , ,人类大脑在涉及更具活力和噪音的环境的领域仍然超过ANN。 , 最近的理论研究表明,在产生高效的学习算法方面,登德里特集成至关重要,这些算法在并行信息处理中可能超过后备。 , , 此外,一个单一的详细的多部位模型可以通过仅调整 synaptic 强度来学习点神经元的网络级非线性计算。 , 因此,从单个详细的神经元模型到大规模的生物细节网络,将大脑类型AI的范式扩大至优先事项。 14 15 16 17 18 19 20 21 22 详细模拟方法的一个长期挑战在于其计算成本过高,这严重限制了其应用于神经科学和人工智能,模拟的主要障碍是根据详细建模的基本理论解决线性方程式。 , , 为了提高效率,经典的Hines方法减少了从O(n3)到O(n)解决方程的时间复杂性,这已经被广泛应用于流行模拟器如NEURON的核心算法。 基因 然而,这种方法使用一个序列方法来顺序处理每个房间. 当一个模拟涉及多种生物物理细节的丹德里特与丹德里特螺丝时,线性方程矩阵(“Hines Matrix”)相应地以日益增加的丹德里特或螺丝(图)。 ),使得Hines方法不再实用,因为它对整个模拟构成了非常沉重的负担。 12 23 24 25 26 第一E 重建的5层金字塔神经元模型和用于详细神经元模型的数学公式。 在数值模拟详细的神经元模型时工作流程. 方程式解决阶段是模拟中的瓶颈。 模拟中的线性方程式的例子。 在解决线性方程式时使用Hines方法的数据依赖性 是的 Hines 矩阵的尺寸与模型复杂性相匹配. 需要解决的线性方程式系统的数量随着模型日益细节化而大幅增加。 在不同类型的神经元模型上使用 Hines 序列方法的计算成本(解决方程中的步骤)。 不同解决方法的说明 神经元的不同部分以平行方法(中,右)分配给多个处理单位,以不同的颜色显示。 三种方法的计算成本 用螺丝解决金字塔模型的方程式。 运行时间为500个金字塔模型解决方程式的不同方法。运行时间表明1秒的模拟时间消耗(以0.025毫秒的时间步骤解决方程式4万次)。p-Hines平行方法在CoreNEURON(GPU上),分支基于分支基于平行方法(GPU上),DHSDendritic等级安排方法(GPU上)。 a b c d c e f g h g i 在过去的几十年中,通过在细胞层面使用并行方法来加速Hines方法取得了巨大的进展,这使每个细胞中的不同部分的计算能够并行。 , , , , , 然而,目前的细胞级平行方法往往缺乏有效的平行化策略,或与原始Hines方法相比缺乏足够的数值准确性。 27 28 29 30 31 32 在这里,我们开发了一种完全自动、数值准确和优化的模拟工具,可以显著加速计算效率并降低计算成本。此外,这种模拟工具可以无缝地用于建立和测试机器学习和人工智能应用的生物细节的神经网络。 并行计算理论 此外,我们通过利用GPU内存等级和内存访问机制优化了DHS,以便为目前最先进的GPU芯片提供优化。 )与经典模拟器NEURON相比 保持相同的精度。 33 34 1 25 为了允许在人工智能中使用详细的丹德里特模拟,我们通过集成DHS嵌入的CoreNEURON(一个优化为NEURON的计算引擎)平台来建立DeepDendrite框架。 作为模拟引擎和两个辅助模块(I/O模块和学习模块)在模拟过程中支持学学习算法 DeepDendrite在GPU硬件平台上运行,支持神经科学和AI学习任务的常规模拟任务。 35 最后,我们还展示了使用DeepDendrite的几种应用,针对神经科学和人工智能的几个关键挑战:(1)我们展示了Dendritic脊柱输入的空间模式如何影响整个Dendritic树(全脊柱模型)中包含脊柱的神经元的神经元活动。DeepDendrite使我们能够在模拟的人类金字塔神经元模型中探索神经元计算,其中有25000个Dendritic脊柱。 所有 DeepDendrite 源代码、全脊模型和详细的 Dendritic 网络模型都可以在网上公开获取(参见代码可用性)。 爆炸依赖的合成性塑性 ,并与spike预测学习 总的来说,我们的研究提供了一套完整的工具,这些工具有可能改变当前的计算神经科学社区生态系统. 通过利用GPU计算的力量,我们预测这些工具将促进脑部细微结构的计算原理的系统级探索,并促进神经科学与现代人工智能之间的相互作用。 21 20 36 结果 Dendritic Hierarchical Scheduling(DHS)的方法 计算离子电流和解决线性方程式是模拟生物物理细节神经元时的两个关键阶段,这些神经元耗时时间并构成严重的计算负担。 因此,解决线性方程式成为平行化过程的剩余瓶颈(图)。 )。 37 1A、F 为了解决这种瓶颈,已经开发了细胞级的平行方法,通过将单个细胞“分裂”成几个可以平行计算的空间来加速单细胞计算。 , , 然而,这种方法依赖于先前的知识,以生成关于如何将单个神经元分裂成部分的实际策略(图)。 二、附加图。 因此,对于具有不对称形态的神经元,例如金字塔神经元和普尔金奇神经元,它变得不那么有效。 27 28 38 1g 和 1 我们旨在开发一种更高效、更精确的并行方法来模拟生物细节的神经网络,首先,我们建立了细胞级并行方法的准确性标准,以并行计算的理论为基础。 ,我们提出了三个条件,以确保平行方法会根据Hines方法中的数据依赖性产生相同的解决方案,然后在理论上评估连续和平行计算方法的运行时间,即效率,我们介绍并制定了计算成本的概念,作为方法在解决方程中采取的步骤的数量(参见方法)。 34 基于模拟的准确性和计算成本,我们将平行化问题定义为数学编程问题(参见方法)。 平行线,我们可以计算最多 节点在每个步骤,但我们需要确保一个节点是计算的只有如果所有的孩子节点已经被处理;我们的目标是找到一个策略,为整个程序的步骤的最低数目。 k k 为了生成一个最佳的分区,我们提出了一种称为Dendritic Hierarchical Scheduling(DHS)的方法(理论证据在方法中呈现)。 DHS方法包括两个步骤:分析丹德里特分区和找到最佳分区:(1)根据详细的模型,我们首先获得相应的依赖树,并计算树上的每个节点的深度(节点的深度是其祖先节点的数量)。 ( 2 ) 经过 topology 分析后,我们搜索候选人并选择最多 最深的候选节点(一个节点仅当其所有子节点已被处理时是候选节点)。 )。 二A 2B、C k 2D DHS工作流程 DHS工作流程 最深的候选人节点每个迭代。 模型首先转换为树结构,然后计算每个节点的深度.颜色表示不同的深度值。 在不同的神经元模型上进行 topology 分析. 这里显示了六个具有不同的形态的神经元. 对于每个模型,soma 被选为树根,因此节点的深度从soma (0)增加到偏远的。 在模型上执行 DHS 的示例 有四个线程的候选人:可处理的节点;选择的候选人:由DHS选择的节点,即可处理的节点。 最深的候选人 处理节点:以前已经处理的节点。 DHS在进程后获得的平行化策略 每个节点被分配给四个平行线程中的一个,DHS通过将节点分配到多个线程,将序列节点处理的步骤从14缩小到5。 相对成本,即 DHS 的计算成本与 Hines 序列方法的比例,当在不同类型的模型上应用 DHS 的不同数量的线程时。 a k b c d b k e d f 举一个简化的模型,15个间隔作为一个例子,使用序列计算Hines方法,它需要14个步骤来处理所有节点,而使用四个平行单元的DHS可以将其节点分割成五个子集(图)。 ): {{9,10,12,14}, {1,7,11,13}, {2,3,4,8}, {6}, {5}}. 由于同一子集中的节点可以并行处理,所以只需要五步来处理使用DHS的所有节点(图)。 )。 2D 二 e 接下来,我们将DHS方法应用于六个代表性的详细神经元模型(从ModelDB中选择) )与不同数量的线索(图。 ):包括皮质和 hippocampal金字塔神经元 , , 脑细胞神经元 射线投影神经元(SPN) )和嗅觉泡米特拉细胞 ,涵盖感官,皮层和下皮层区域的主要神经元。然后我们测量了计算成本。 这里的相对计算成本是由DHS的计算成本与系列Hines方法的比例定义的。 计算成本,即解决方程式所采取的步骤数量,随着线程数量的增加而大幅下降。 例如,在16个线程中,DHS的计算成本与系列Hines方法相比为7%-10%。 有趣的是,DHS方法在给出16个甚至8个平行线程时达到呈现的神经元的计算成本的较低边界。 ),建议添加更多的线程不会进一步改善性能,因为间隔之间的依赖。 39 2F 40 41 42 43 44 45 2F 一起,我们生成了一种 DHS 方法,它允许自动分析登德里特 topology 和最佳分区进行平行计算。值得注意的是,DHS 在模拟开始前找到最佳分区,而解决方程式不需要额外的计算。 通过 GPU 内存增强来加速 DHS DHS用多个线程计算每个神经元,在运行神经网络模拟时消耗大量线程。 (二)平行计算 在理论上,GPU上的许多SP应该支持大规模神经网络的高效模拟(图)。 然而,我们一贯观察到,当网络规模增加时,DHS的效率显著下降,这可能是由于数据存储或加载和写入中间结果引起的额外内存访问所导致的。 左) 3A、B 46 3C 3D GPU架构及其内存等级.每个GPU都包含大量的处理单元(流处理器)。 流媒体多处理器(SM)的架构:每个SM都包含多个流媒体处理器、注册表和L1缓存。 将DHS应用于两个神经元,每个神经元都有四个线程,在模拟过程中,每个线程在一个流程处理器上执行。 GPU上的内存优化策略. DHS的顶部面板、线程分配和数据存储,在(左)和(右)内存增强之前。 处理器发送数据请求来从全球内存中加载每个线程的数据,而没有内存加载(左),需要7个交易来加载所有请求数据和一些额外的交易来获取中间结果。 DHS的运行时间(每个单元格都有32个线程)在多层5个金字塔模型上具有和没有内存增强。 加速在多层5金字塔模型上的记忆增强,带有旋转的记忆增强带来了 1.6-2 倍的加速。 a b c d d e f 我们通过 GPU 内存增强来解决这个问题,这是通过利用 GPU 内存等级和访问机制来增加内存输出的一种方法. 基于 GPU 的内存加载机制,连续加载对齐和连续存储数据的线程导致与访问分散存储数据相比,内存输出高,从而减少了内存输出。 , 为了实现高输出量,我们首先将节点的计算命令对齐,并根据节点上的节点数量重新安排线程,然后我们将数据存储转移到全球内存中,与计算命令一致,即在同一步骤中处理的节点在全球内存中连续存储。 此外,对具有脊柱和典型的神经元模型的金字塔神经元的多个数字的实验(图)。 二、附加图。 )显示,与天真的DHS相比,内存增强实现了1.2-3.8倍的加速。 46 47 3D 3e , f 2 为了全面测试DHS的性能,我们选择了六种典型的神经元模型,并评估了解决电缆方程式的运行时间,每个模型的大量数字(图)。 我们分别对每个神经元进行了四个线程(DHS-4)和十六个线程(DHS-16)的DHSS检查,与CoreNEURON中的GPU方法相比,DHS-4和DHS-16分别可加速约5倍和15倍(图)。 此外,与常规的序列Hines方法相比,在NEURON中运行一个单线程的CPU中,DHS加速了模拟 2-3 个大小顺序(补充图。 ) ,同时保持相同的数值准确性在存在密集的螺旋(补充形状。 和 ) ,活性(补充图。 )和不同的细分策略(补充图。 )。 4 四A 3 4 8 7 7 CoreNEURON:在CoreNEURON中使用的平行方法;DHS-4:DHS每个神经元有四个线条;DHS-16:DHS每个神经元有16个线条。 , DHS-4 和 DHS-16 可视化分区,每个颜色表示一个单一的线程,在计算过程中,每个线程都会在不同的分支之间切换。 a b c DHS 创建单元格类型特定的最佳分区 为了了解DHS方法的工作机制,我们通过对每个线程的分区进行映射(每种颜色在图中呈现一个单一线程)。 可视化显示,单个线条经常在不同的分支之间切换(图)。 有趣的是,DHS在形态上对称的神经元中生成对齐分区,如皮带投射神经元(SPN)和米特拉细胞(图)。 相比之下,它产生了形态上不对称的神经元的碎片分区,如金字塔神经元和Purkinje细胞(图)。 ),表明DHS在单个分区尺度(即树节点)上分裂神经树,而不是分支尺度。 4B、C 4B、C 4B、C 4B、C 总而言之,DHS和内存增强产生了一个理论证明的最佳解决方案,以平行解决线性方程式,以前所未有的效率。使用这一原则,我们构建了开放访问的DeepDendrite平台,神经科学家可以使用它来实现模型,而没有任何特定的GPU编程知识。 DHS 允许脊髓水平建模 由于dendritic脊柱接收的大部分刺激性输入到皮层和 hippocampal金字塔神经元, striatal投射神经元等,他们的形态和塑性对于调节神经元刺激性至关重要。 , , , , 然而,脊柱太小( ~ 1 微米长度),以便直接在电压依赖的过程中进行实验测量,因此,理论工作对于全面理解脊柱计算至关重要。 10 48 49 50 51 我们可以建模一个单一的脊椎,有两个部分:脊椎头,在那里坐落的 synapses 和脊椎颈部连接的脊椎头与。 该理论预测,非常薄的脊柱颈(直径 0.05 毫米)电子地隔离脊柱头从其父母的牙,从而分区在脊柱头产生的信号。 然而,完全分布式螺旋的细节模型(“全螺旋模型”)在计算上非常昂贵。 一个常见的妥协解决方案是改变膜的容量和阻力 脊髓因子 ,而不是模拟所有脊柱明确地。 脊椎因子旨在接近脊椎对细胞膜的生物物理特性的影响 . 52 53 F 54 F 54 Inspired by the previous work of Eyal et al. ,我们研究了如何不同空间模式的兴奋输入形成在丹德里特脊柱塑造神经元活动在一个人体金字塔神经元模型中具有明确的模型脊柱(图)。 ). Noticeably, Eyal et al. employed the spine factor to incorporate spines into dendrites while only a few activated spines were explicitly attached to dendrites (“few-spine model” in Fig. ). The value of spine in their model was computed from the dendritic area and spine area in the reconstructed data. Accordingly, we calculated the spine density from their reconstructed data to make our full-spine model more consistent with Eyal’s few-spine model. With the spine density set to 1.3 μm-1, the pyramidal neuron model contained about 25,000 spines without altering the model’s original morphological and biophysical properties. Further, we repeated the previous experiment protocols with both full-spine and few-spine models. We use the same synaptic input as in Eyal’s work but attach extra background noise to each sample. By comparing the somatic traces (Fig. ) and spike probability (Fig. ) in full-spine and few-spine models, we found that the full-spine model is much leakier than the few-spine model. In addition, the spike probability triggered by the activation of clustered spines appeared to be more nonlinear in the full-spine model (the solid blue line in Fig. )比在少数脊柱模型(图中的蓝线。 ). These results indicate that the conventional F-factor method may underestimate the impact of dense spine on the computations of dendritic excitability and nonlinearity. 51 5a F 5a F 5B、C 5D 5D 5D 实验设置. 我们研究两种主要类型的模型:少数脊柱模型和全脊柱模型.少数脊柱模型(左侧有两种)是将脊柱区域整合到丹德里特的模型,并且只将单个脊柱与激活的突触相结合。在全脊柱模型中(右侧有两种),所有脊柱都明确地连接到整个丹德里特上。我们探讨了聚合和随机分布的 synaptic输入对少数脊柱模型和全脊柱模型的影响。 Somatic voltages recorded for cases in 电压曲线的颜色与 尺寸: 20 ms, 20 mV Color-coded voltages during the simulation in 在特定时刻,颜色表示电压的大小。 Somatic spike probability as a function of the number of simultaneously activated synapses (as in Eyal et al.’s work) for four cases in . Background noise is attached. Run time of experiments in 使用不同的模拟方法. NEURON:在单个CPU内核上运行的常规NEURON模拟器. CoreNEURON:在单个GPU上运行的 CoreNEURON模拟器. DeepDendrite:在单个GPU上运行的 DeepDendrite。 a b a a c b d a e d 在DeepDendrite平台上,全脊椎和少脊椎模型都实现了与GPU平台上的CoreNEURON相比的8倍加速,与CPU平台上的连续NEURON相比的100倍加速(图)。 ; Supplementary Table )同时保持相同的模拟结果(补充图。 and ). Therefore, the DHS method enables explorations of dendritic excitability under more realistic anatomic conditions. 5e 1 4 8 Discussion In this work, we propose the DHS method to parallelize the computation of Hines method 然后,我们在GPU硬件平台上实施DHS,并使用GPU内存增强技术来改进DHS(图)。 当模拟大量具有复杂形态的神经元时,DHS通过增强记忆力实现了15倍的加速(补充表) )与CoreNEURON中使用的GPU方法相比,与CPU平台中的序列Hines方法相比,加速高达1500倍(图)。 二、附加图。 and Supplementary Table 此外,我们通过将DHS集成到CoreNEURON中,开发了基于GPU的DeepDendrite框架,最后,作为DeepDendrite的能力示范,我们提出了一个代表性的应用程序:在一个详细的金字塔神经元模型中检查脊椎计算,其中包含25000个脊椎。进一步,在本节中,我们详细介绍了我们如何扩展DeepDendrite框架,以便有效地训练生物物理细节的神经网络。 , we train our network on typical image classification tasks. We show that DeepDendrite can support both neuroscience simulations and AI-related detailed neural network tasks with unprecedented speed, therefore significantly promoting detailed neuroscience simulations and potentially for future AI explorations. 55 3 1 4 3 1 56 数十年的努力已经投入到加速Hines方法使用平行方法。早期工作主要集中在网络水平的平行化上。在网络模拟中,每个单元格通过Hines方法独立解决其相应的线性方程式。 , . With network-level methods, we can simulate detailed networks on clusters or supercomputers . In recent years, GPU has been used for detailed network simulation. Because the GPU contains massive computing units, one thread is usually assigned one cell rather than a cell group , , . With further optimization, GPU-based methods achieve much higher efficiency in network simulation. However, the computation inside the cells is still serial in network-level methods, so they still cannot deal with the problem when the “Hines matrix” of each cell scales large. 57 58 59 35 60 61 细胞级平行方法进一步平行计算在每个细胞内部,细胞级平行方法的主要想法是将每个细胞分成几个子块,并平行计算这些子块。 , . However, typical cellular-level methods (e.g., the “multi-split” method ) pay less attention to the parallelization strategy. The lack of a fine parallelization strategy results in unsatisfactory performance. To achieve higher efficiency, some studies try to obtain finer-grained parallelization by introducing extra computation operations , , or making approximations on some crucial compartments, while solving linear equations , . These finer-grained parallelization strategies can get higher efficiency but lack sufficient numerical accuracy as in the original Hines method. 27 28 28 29 38 62 63 64 与以前的方法不同,DHS采用了最精细的平行化策略,即区级平行化。通过将“如何平行化”问题作为组合优化问题来模拟,DHS提供了最佳的区级平行化策略。 Dendritic spines are the most abundant microstructures in the brain for projection neurons in the cortex, hippocampus, cerebellum, and basal ganglia. As spines receive most of the excitatory inputs in the central nervous system, electrical signals generated by spines are the main driving force for large-scale neuronal activities in the forebrain and cerebellum , 脊椎的结构,具有扩大的脊椎头和非常薄的脊椎颈部,导致脊椎头的输入阻力惊人高,可达500MΩ,结合实验数据和详细的室内建模方法 , . Due to such high input impedance, a single synaptic input can evoke a “gigantic” EPSP ( ~ 20 mV) at the spine-head level , ,从而增强NMDA电流和脊椎中的离子通道电流 . However, in the classic single detailed compartment models, all spines are replaced by the 变更dendritic电缆几何系数 . This approach may compensate for the leak currents and capacitance currents for spines. Still, it cannot reproduce the high input impedance at the spine head, which may weaken excitatory synaptic inputs, particularly NMDA currents, thereby reducing the nonlinearity in the neuron’s input-output curve. Our modeling results are in line with this interpretation. 10 11 48 65 48 66 11 F 54 On the other hand, the spine’s electrical compartmentalization is always accompanied by the biochemical compartmentalization , , ,导致脊椎内的内部(Ca2+)的急剧增加,以及涉及学习和记忆至关重要的协同性塑性分子过程的瀑布。有趣的是,通过学习引发的生物化学过程,反过来,重新塑造了脊椎的形态,扩大(或缩小)脊椎头,或延长(或缩短)脊椎颈部,这显著改变了脊椎的电容量。 , , , 这种依赖经验的脊椎形态变化也被称为“结构性塑性”,在视觉皮层中被广泛观察到。 , , somatosensory cortex , , motor cortex 希波坎普斯 ,以及基层 ganglia 然而,由于计算成本,几乎所有详细的网络模型都利用“F-factor”方法来取代实际的旋转,因此无法在系统水平上探索脊椎功能。利用我们的框架和GPU平台,我们可以运行几千个详细的神经元模型,每个具有数万个旋转在一个GPU上,同时保持 ~100倍的速度,比传统的连续方法在一个CPU上(图)。 ). Therefore, it enables us to explore of structural plasticity in large-scale circuit models across diverse brain regions. 8 52 67 67 68 69 70 71 72 73 74 75 9 76 5e Another critical issue is how to link dendrites to brain functions at the systems/network level. It has been well established that dendrites can perform comprehensive computations on synaptic inputs due to enriched ion channels and local biophysical membrane properties , , . For example, cortical pyramidal neurons can carry out sublinear synaptic integration at the proximal dendrite but progressively shift to supralinear integration at the distal dendrite . Moreover, distal dendrites can produce regenerative events such as dendritic sodium spikes, calcium spikes, and NMDA spikes/plateau potentials , . Such dendritic events are widely observed in mice 甚至人类皮质神经元 in vitro, which may offer various logical operations , 或加密功能 , . Recently, in vivo recordings in awake or behaving mice provide strong evidence that dendritic spikes/plateau potentials are crucial for orientation selectivity in the visual cortex , sensory-motor integration in the whisker system , , and spatial navigation in the hippocampal CA1 region . 5 6 7 77 6 78 6 79 6 79 80 81 82 83 84 85 为了确立丹德里特和动物(包括人类)行为模式之间的因果关系,大规模生物物理细节神经电路模型是实现这一任务的强大计算工具。然而,运行大规模细节电路模型,包括1万至1万个神经元,通常需要超级计算机的计算能力。为体内数据优化此类模型更具挑战性,因为它需要模型的迭代模拟。 , , , which were initially developed based on NEURON. Moreover, using our framework, a single GPU card such as Tesla A100 could easily support the operation of detailed circuit models of up to 10,000 neurons, thereby providing carbon-efficient and affordable plans for ordinary labs to develop and optimize their own large-scale detailed models. 86 87 88 Recent works on unraveling the dendritic roles in task-specific learning have achieved remarkable results in two directions, i.e., solving challenging tasks such as image classification dataset ImageNet with simplified dendritic networks , and exploring full learning potentials on more realistic neuron , 然而,模型大小和生物细节之间存在妥协,因为网络规模的增加往往是为了神经元级的复杂性而牺牲的。 , , 此外,更详细的神经元模型在数学上不太可探测,而且在计算上更昂贵。 . 20 21 22 19 20 89 21 There has also been progress in the role of active dendrites in ANNs for computer vision tasks. Iyer et al. . proposed a novel ANN architecture with active dendrites, demonstrating competitive results in multi-task and continual learning. Jones and Kording used a binary tree to approximate dendrite branching and provided valuable insights into the influence of tree structure on single neurons’ computational capacity. Bird et al. . proposed a dendritic normalization rule based on biophysical behavior, offering an interesting perspective on the contribution of dendritic arbor structure to computation. While these studies offer valuable insights, they primarily rely on abstractions derived from spatially extended neurons, and do not fully exploit the detailed biological properties and spatial information of dendrites. Further investigation is needed to unveil the potential of leveraging more realistic neuron models for understanding the shared mechanisms underlying brain computation and deep learning. 90 91 92 In response to these challenges, we developed DeepDendrite, a tool that uses the Dendritic Hierarchical Scheduling (DHS) method to significantly reduce computational costs and incorporates an I/O module and a learning module to handle large datasets. With DeepDendrite, we successfully implemented a three-layer hybrid neural network, the Human Pyramidal Cell Network (HPC-Net) (Fig. 该网络在图像分类任务中表现出高效的培训能力,与传统的基于CPU的平台上的培训相比,实现了大约25倍的加速(图)。 ; Supplementary Table )。 6a, b 6f 1 The illustration of the Human Pyramidal Cell Network (HPC-Net) for image classification. Images are transformed to spike trains and fed into the network model. Learning is triggered by error signals propagated from soma to dendrites. Training with mini-batch. Multiple networks are simulated simultaneously with different images as inputs. The total weight updates ΔW are computed as the average of ΔWi from each network. 训练前和训练后HPC网的比较 左侧,隐藏神经元对特定的输入反应的可视化(顶部)和训练后(底部)。 Workflow of the transfer adversarial attack experiment. We first generate adversarial samples of the test set on a 20-layer ResNet. Then use these adversarial samples (noisy images) to test the classification accuracy of models trained with clean images. Prediction accuracy of each model on adversarial samples after training 30 epochs on MNIST (left) and Fashion-MNIST (right) datasets. 对 HPC-Net 进行训练和测试的时间。批量大小设置为 16 个。 左边,运行训练时间为 1 个时代。 右边,运行测试时间。 并行 NEURON + Python:在单个 CPU 上进行培训和测试,使用 40 个流程的并行 NEURON 来模拟 HPC-Net 和额外的 Python 代码来支持小批量训练。 DeepDendrite:在单个 GPU 上培训和测试 HPC-Net 用 DeepDendrite。 a b c d e f 此外,人们普遍认识到,人工神经网络(ANN)的性能可能会受到敌对攻击的损害。 —intentionally engineered perturbations devised to mislead ANNs. Intriguingly, an existing hypothesis suggests that dendrites and synapses may innately defend against such attacks 我们的实验结果利用HPC-Net支持这一假设,因为我们观察到具有详细的结构的网络表现出对转移敌对攻击的抵抗力有所增加。 compared to standard ANNs, as evident in MNIST and Fashion-MNIST datasets (Fig. ). This evidence implies that the inherent biophysical properties of dendrites could be pivotal in augmenting the robustness of ANNs against adversarial interference. Nonetheless, it is essential to conduct further studies to validate these findings using more challenging datasets such as ImageNet . 93 56 94 95 96 6D、E 97 In conclusion, DeepDendrite has shown remarkable potential in image classification tasks, opening up a world of exciting future directions and possibilities. To further advance DeepDendrite and the application of biologically detailed dendritic models in AI tasks, we may focus on developing multi-GPU systems and exploring applications in other domains, such as Natural Language Processing (NLP), where dendritic filtering properties align well with the inherently noisy and ambiguous nature of human language. Challenges include testing scalability in larger-scale problems, understanding performance across various tasks and domains, and addressing the computational complexity introduced by novel biological principles, such as active dendrites. By overcoming these limitations, we can further advance the understanding and capabilities of biophysically detailed dendritic neural networks, potentially uncovering new advantages, enhancing their robustness against adversarial attacks and noisy inputs, and ultimately bridging the gap between neuroscience and modern AI. Methods Simulation with DHS CoreNEURON simulator ( ) uses the NEURON 我们通过修改其源代码在CoreNEURON环境中实施我们的Dendritic Hierarchical Scheduling(DHS)方法.所有可在CoreNEURON GPU上模拟的模型也可通过执行以下命令进行DHS模拟: 35 https://github.com/BlueBrain/CoreNeuron 25 coreneuron_exec -d /path/to/models -e time --cell-permute 3 --cell-nthread 16 --gpu The usage options are as in Table . 1 Accuracy of the simulation using cellular-level parallel computation 为了确保模拟的准确性,我们首先需要定义细胞级平行算法的正确性,以判断它是否将与已被证明的正确的序列方法相比产生相同的解决方案,例如在Neuron模拟平台中使用的Hines方法。 , a parallel algorithm will yield an identical result as its corresponding serial algorithm, if and only if the data process order in the parallel algorithm is consistent with data dependency in the serial method. The Hines method has two symmetrical phases: triangularization and back-substitution. By analyzing the serial computing Hines method ,我们发现其数据依赖性可以作为一个树结构来表达,树上的节点代表细节神经元模型的分区。在三角化过程中,每个节点的值取决于其子节点。 ). Thus, we can compute nodes on different branches in parallel as their values are not dependent. 34 55 1d 基于 Hines 序列计算方法的数据依赖性,我们提出了三种条件,以确保平行方法会产生与 Hines 序列计算方法相同的解决方案:(1)树的形态和所有节点的初始值与 Hines 序列计算方法相同;(2)在三角化阶段,一个节点可以被处理,如果并且只有如果其所有子节点已经被处理; (3)在后置阶段,一个节点只能被处理,如果其母节点已经被处理。 Computational cost of cellular-level parallel computing method 为了理论上评估连续和平行计算方法的运行时间,即效率,我们引入和制定计算成本的概念如下: 和 线程(基本计算单位)进行三角化,平行三角化等于分裂节点集 of into 子,即 是( ) , , … } where the size of each subset | | ≤ , i.e., at most nodes can be processed each step since there are only threads. The process of the triangularization phase follows the order: → → … → ,并在同一子集中的节点 can be processed in parallel. So, we define | | (the size of set 也就是说, 简而言之,我们将平行方法的计算成本定义为它在三角化阶段所采取的步骤数,因为后置与三角化是对称的,所以整个解决方程式阶段的总成本是三角化阶段的两倍。 T k V T n V v1 的 V2 Vn 五 k k k V1 v2 的 Vn Vi V V n Mathematical scheduling problem Based on the simulation accuracy and computational cost, we formulate the parallelization problem as a mathematical scheduling problem: Given a tree 是( ) , 一个积极的整数 何处 它是节点和 is the edge set. Define partition ( ) = { , ,...... , 1 ≤ ≤n,其中 | indicates the cardinal number of subset , i.e., the number of nodes in 而对于每一个节点 ∈ , all its children nodes { | 孩子们( ) )必须在以前的子集中 , where 1 ≤ 」 我们的目标是找到一个最佳的分区 ( ) whose computational cost | ( )| is minimal. T V E k V E P V v1 的 V2 联合国 五 k i Vi 五 五 v 五 c c v VJ j i P* V P* V Here subset 由所有节点组成,将被计算在 -th step (Fig. ), so | 表明我们可以计算 nodes each step at most because the number of available threads is . The restriction “for each node ∈ , all its children nodes { 子 孩子们( ) )必须在以前的子集中 , where 1 ≤ < ”表示这个节点 只有在所有孩子的节点都被处理时,它才能被处理。 Vi i 2e Vi k k k v Vi c c v Vj j i v DHS 实施 We aim to find an optimal way to parallelize the computation of solving linear equations for each neuron model by solving the mathematical scheduling problem above. To get the optimal partition, DHS first analyzes the topology and calculates the depth ( ) for all nodes ∈ . Then, the following two steps will be executed iteratively until every node ∈ is assigned to a subset: (1) find all candidate nodes and put these nodes into candidate set . A node is a candidate only if all its child nodes have been processed or it does not have any child nodes. (2) if | | ≤ , i.e., the number of candidate nodes is smaller or equivalent to the number of available threads, remove all nodes in 并将它们放入 否则,删除 最深的节点从 and add them to subset . Label these nodes as processed nodes (Fig. ). After filling in subset , go to step (1) to fill in the next subset . d v v V v V Q Q k Q V*i k Q Vi 2D Vi Vi+1 Correctness proof for DHS After applying DHS to a neural tree = { , }, we get a partition ( ) = { , , … }, | | ≤ , 1 ≤ ≤ . Nodes in the same subset will be computed in parallel, taking steps to perform triangularization and back-substitution, respectively. We then demonstrate that the reordering of the computation in DHS will result in a result identical to the serial Hines method. T V E P V v1 的 V2 Vn Vi k i n 五 n 分割的 ( )从DHS获得决定了神经树中所有节点的计算顺序。 ( (二)符合准确性条件。 ( )是从给定的神经树中获得的 DHS中的操作不会改变树顶和树节点的值(线性方程中相应的值),所以树形和所有节点的初始值没有改变,这满足了条件1:树形和所有节点的初始值与连续Hines方法相同。 两 正如DHS的实施所示,在子集中的所有节点 由候选人组选出 ,并且可以将一个节点放入 只有在孩子们的孩子们中间,每个人都有自己的孩子。 他们在 , ,...... 也就是说,一个节点只有在其所有子女被处理后才被计算,这满足了条件2:在三角化中,一个节点只有在所有子节点已经被处理的情况下才能被处理。 两 如前所述,儿童节点的所有节点在 are in { , ,...... }, so parent nodes of nodes in 他们在 , , … 满足条件3:在后代替中,只有在其母节点已经被处理的情况下,一个节点才能被处理。 P V P V P V T v1 的 联合国 五 Q Q 五 v1 的 v2 的 维1 联合国 v1 的 五 V1 V2 Vi-1 Vi Vi+1 Vi+2 Vn 对DHS的最佳证明 证据的想法是,如果有另一种最佳解决方案,它可以转化为我们的DHS解决方案,而不会增加算法所需的步骤数量,从而表明DHS解决方案是最佳的。 对于每一个子 在 ( (DHS移动) (线程数)来自相应候选集的最深节点 两 如果节点的数量在 它比 将所有节点从 两 为了简化,我们介绍 ,表示深度的总和 最深的节点在 所有底层在 ( )符合最大深度标准(补充图。 然后我们证明,在每个迭代中选择最深的节点会使 一个最佳的分区,如果有一个最佳的分区 是( ) , ,...... 包含不符合最大深度标准的子集,我们可以修改 ( )使所有子集由最深的节点组成 的數量() ( (二)修改后保持相同。 五 P V k Qi 五 Qi k Qi 五 对 k Qi P V 六a P(V) P(V) V * 1 V * 2 《V》 P* V Q P* V Without any loss of generalization, we start from the first subset 不符合条件,也就是说,有两个可能的案例会使 not satisfy the max-depth criteria: (1) | | < and there exist some valid nodes in that are not put to ; (2) | | = but nodes in are not the 最深的节点在 . V*i V*i V*i k Qi V*i 五、i k V*i k Qi For case (1), because some candidate nodes are not put to ,这些节点必须在后续子集中。 | , we can move the corresponding nodes from the subsequent subsets to , which will not increase the number of subsets and make satisfy the criteria (Supplementary Fig. , top). For case (2), | | = , these deeper nodes that are not moved from the candidate set 入 必须添加到后续子集(附加图。 , bottom). These deeper nodes can be moved from subsequent subsets to 按下列方法,假设在填写后 , 被选中,其中一个 -th deepest nodes is still in , thus 将被放入下一部分。 (一) 」 ). We first move 从 两 + , then modify subset + 以上 此分類下一篇: 如果 + 以上 | ≤ and none of the nodes in + 是 Node 的父母 , stop modifying the latter subsets. Otherwise, modify + as follows (Supplementary Fig. ):如果父母节点的 is in + , move this parent node to + ; else move the node with minimum depth from + to + 以上 调整后 , modify subsequent subsets + , + 以上 ,...... with the same strategy. Finally, move 从 to . V*i 五、i < k V*i 五、i 6B V*i k Qi 五、i 6b 五、i V*i v k v’ Qi v’ V*j j i v 五、i V*i 1 五、i 1 V*i 1 k V*i 1 v V*i 1 6c v 五、i 1 V*i 2 五、i 1 V*i 2 V*i V*i 1 五、i 2 V*j-1 v’ V*j V*i 通过上面描述的修改策略,我们可以更换所有较浅的节点。 与The 最深的节点在 and keep the number of subsets, i.e., | ( )| the same after modification. We can modify the nodes with the same strategy for all subsets in ( ) that do not contain the deepest nodes. Finally, all subsets ∈ ( )可以满足最大深度的标准,并 ( )| does not change after modifying. 五、i k Qi P* V P* V 五、i P* V P* V 最后,DHS生成一个分区。 ( )和所有附件 ∈ ( )满足最大深度条件: . 对于任何其他最佳分区 ( )我们可以修改其子集以使其结构与 ( ),即,每个子集由候选组中最深的节点组成,并保持 ( ) 相同的变更,所以,分区 ( )从DHS获得是最佳分区之一。 P V 五 P V P* V P V P* V | P V GPU 实现和内存增强 为了实现高内存输出量,GPU 利用 (1) 全球内存、 (2) 缓存、 (3) 注册表的内存等级,其中全球内存具有较大容量但输出量较低,而注册表具有较低容量但输出量较高。 GPU 使用 SIMT (单指令、多线) 架构. Warps 是 GPU 的基本编程单元(warp 是 32 个平行线程的组)。 正确地排序节点对于这种在变种中进行计算的批量是必不可少的,以确保DHS获得与系列Hines方法相同的结果。在在GPU上实施DHS时,我们首先根据其形态组合所有细胞为多个变种。具有相似形态的细胞被组合在相同变种中。然后我们将DHS应用于所有神经元,将每个神经元的分区分为多个线程。由于神经元被组合成变种,同一个神经元的线程位于相同变种中。因此,在变种中内在的同步保持计算顺序与序列Hines方法的数据依赖一致。最后,每个变种中的线程按照分区数量进行排列和重新排列。 46 当一个瓦普从全球内存中加载预排和连续存储的数据时,它可以充分利用缓存,从而导致高内存输出量,而访问分散存储的数据会减少内存输出量。在分区分配和线条重新排列后,我们将数据转移到全球内存中,使其与计算命令一致,以便瓦普在执行程序时能够加载连续存储的数据。 全脊椎和少脊椎生物物理模型 我们使用出版的人类金字塔神经元 . The membrane capacitance m = 0.44 μF cm-2, membrane resistance m = 48,300 Ω cm2, and axial resistivity a = 261.97 Ω cm. In this model, all dendrites were modeled as passive cables while somas were active. The leak reversal potential l = -83.1 mV. Ion channels such as Na+ and K+ were inserted on soma and initial axon, and their reversal potentials were Na = 67.6 mV, K = -102 mV respectively. All these specific parameters were set the same as in the model of Eyal, et al. 有关详细信息,请参阅发布的模型(ModelDB,访问号码238347)。 51 c r r E E E 51 In the few-spine model, the membrane capacitance and maximum leak conductance of the dendritic cables 60 μm away from soma were multiplied by a spine factor to approximate dendritic spines. In this model, spine was set to 1.9. Only the spines that receive synaptic inputs were explicitly attached to dendrites. F F In the full-spine model, all spines were explicitly attached to dendrites. We calculated the spine density with the reconstructed neuron in Eyal, et al. . The spine density was set to 1.3 μm-1, and each cell contained 24994 spines on dendrites 60 μm away from the soma. 51 The morphologies and biophysical mechanisms of spines were the same in few-spine and full-spine models. The length of the spine neck 颈部 = 1.35 μm 和直径 neck = 0.25 μm, whereas the length and diameter of the spine head were 0.944 μm, i.e., the spine head area was set to 2.8 μm2. Both spine neck and spine head were modeled as passive cables, with the reversal potential = -86 mV. 特定的膜容量、膜阻力和轴性阻力与登德里特相同。 L D El Synaptic inputs We investigated neuronal excitability for both distributed and clustered synaptic inputs. All activated synapses were attached to the terminal of the spine head. For distributed inputs, all activated synapses were randomly distributed on all dendrites. For clustered inputs, each cluster consisted of 20 activated synapses that were uniformly distributed on a single randomly-selected compartment. All synapses were activated simultaneously during the simulation. 模拟了基于AMPA和基于NMDA的 synaptic 电流,就像 Eyal 等人的工作一样。AMPA 导电被模拟为双指数函数,NMDA 导电作为电压依赖的双指数函数。 升起和 decay were set to 0.3 and 1.8 ms. For the NMDA model, 升起和 decay were set to 8.019 and 34.9884 ms, respectively. The maximum conductance of AMPA and NMDA were 0.73 nS and 1.31 nS. τ τ τ τ Background noise We attached background noise to each cell to simulate a more realistic environment. Noise patterns were implemented as Poisson spike trains with a constant rate of 1.0 Hz. Each pattern started at 開始 = 10 ms 並持續到模擬的結束為止. 我們為每個細胞生產了 400 個噪音峰列車,並將它們連接到隨機選擇的合併體。 , except that the maximum conductance of NMDA was uniformly distributed from 1.57 to 3.275, resulting in a higher AMPA to NMDA ratio. t Synaptic Inputs Exploring neuronal excitability 对于分布式输入,我们测试了14个案例,从0到240个激活的 synapses。对于集群输入,我们测试了总共9个案例,分别从0到12个集群激活。每个集群由20个 synapses组成。对于分布式和集群输入的每个案例,我们用50个随机样本计算了峰值的概率。 峰值概率被定义为发射的神经元数和样本总数的比例。 在我们的DeepDendrite平台上,所有1150个样本同时模拟,从几天到几分钟缩短了模拟时间。 使用DeepDendrite平台执行AI任务 Conventional detailed neuron simulators lack two functionalities important to modern AI tasks: (1) alternately performing simulations and weight updates without heavy reinitialization and (2) simultaneously processing multiple stimuli samples in a batch-like manner. Here we present the DeepDendrite platform, which supports both biophysical simulating and performing deep learning tasks with detailed dendritic models. DeepDendrite 由三个模块组成(附加图。 ):(1)一个 I/O 模块;(2)一个基于 DHS 的模拟模块;(3)一个学习模块.在训练一个生物物理细节模型来执行学习任务时,用户首先定义学习规则,然后将所有训练样本输入到详细的学习模型。在训练过程中,在每个步骤中,I/O 模块从所有训练样本中选择一个特定的刺激和相应的教师信号(如有必要),并将刺激附加到网络模型中。然后,基于 DHS 的模拟模块将模型初始化并启动模拟。模拟后,学习模块根据模型响应和教师信号之间的差异更新所有 synaptic 重量。训练后,学习模型可以实现与 ANN 相似的性能。测试阶段与训练相似,但所有 synap 5 HPC-Net model Image classification is a typical task in the field of AI. In this task, a model should learn to recognize the content in a given image and output the corresponding label. Here we present the HPC-Net, a network consisting of detailed human pyramidal neuron models that can learn to perform image classification tasks by utilizing the DeepDendrite platform. HPC-Net 有三个层,即一个输入层,一个隐藏的层和一个输出层。输入层中的神经元接受从图像中转换的峰值列车作为他们的输入。 隐藏的层神经元接收输入层神经元的输出,并在输出层中给出神经元的响应。 输出层神经元的响应被视为HPC-Net的最终输出。 邻近层之间的神经元完全连接在一起。 For each image stimulus, we first convert each normalized pixel to a homogeneous spike train. For pixel with coordinates ( )在图像中,相应的螺旋列车具有恒定的螺旋间隔 ISI( )(在ms)是由像素值确定的 ( (如图中所示) )。 x , y τ x , y p x , y 1 在我们的实验中,每个刺激的模拟持续了50毫秒,所有尖端火车开始于9 +。 ISI ms 持续到模拟的结束,然后我们以一对一的方式将所有尖端列车连接到输入层神经元。 被赋予 τ t0 哪里 是后 synaptic 电压,逆转潜力 syn = 1 mV, the maximum synaptic conductance max = 0.05 μS,时间常数 等于 0.5 ms v E g τ 输入层中的神经元采用被动单分区模型进行建模,具体参数如下:膜容量 m = 1.0 μF cm-2,膜阻力 m = 104 Ω cm2,轴性抵抗 a = 100 Ω cm,被动空间的逆转潜力 l = 0 mV c r r E 隐藏的层包含一组人体金字塔神经元模型,接收输入层神经元的 somatic 电压。 ,所有神经元都是用被动电缆建模的。 m = 1.5 μF cm-2, membrane resistance m = 48,300 Ω cm2, axial resistivity a = 261.97 Ω cm,所有被动电缆的逆转潜力 l = 0 mV. Input neurons could make multiple connections to randomly-selected locations on the dendrites of hidden neurons. The synaptic current activated by the 该 Synapse of the 神经元上的输入神经元 ( )在哪里 它是synaptic conductance。 is the synaptic weight, is the ReLU-like somatic activation function, and is the somatic voltage of the -th input neuron at time . 51 c r r E k i j 4 吉克 Wijk i t 输出层中的神经元也采用被动单部位模型进行模型,每个隐藏的神经元只对每个输出神经元进行了一次 synaptic 连接。 )。 4 图像分类与 HPC-Net For each input image stimulus, we first normalized all pixel values to 0.0-1.0. Then we converted normalized pixels to spike trains and attached them to input neurons. Somatic voltages of the output neurons are used to compute the predicted probability of each class, as shown in equation , where 这是概率的 -th class predicted by the HPC-Net, is the average somatic voltage from 20 ms to 50 ms of the 输出神经元,以及 在本文中,我们用784个输入神经元,64个隐藏神经元和10个输出神经元构建了HPC网络。 6 pi i i C 用于 HPC-Net 的 synaptic plasticity 规则 从以前的工作中汲取灵感 ,我们使用基于梯度的学习规则来训练我们的HPC-Net来执行图像分类任务。 )在哪里 是类的预测概率 , 表示刺激图像所属的实际类别, = 1 如果输入图像属于类 ,和 0 如果不是。 36 7 皮 i 李 李 i yi When training HPC-Net, we compute the update for weight (其中的合成重量为 连接神经元的 synapse to neuron ) at each time step. After the simulation of each image stimulus, 如图中所示( ) ) : 周边 k i j 周边 8 Here is the learning rate, is the update value at time , , 是神经元的 somatic voltages and 相应地, 是的。 由神经元激活的 synaptic current 神经元 , its synaptic conductance, is the transfer resistance between the 神经元的连接部位 神经元 神经元 至 神经元 ”””索马里。 s = 30 ms e = 50 ms 是学习的开始时间和结束时间分别。对于输出神经元,错误术语可以按照 Eq 所示计算。 对于隐藏的神经元,错误术语是从输出层中的错误术语计算出来的(Eq)。 )。 t vj 我 i j 爱尔兰 k i j gijk 王国 k i j j t t 10 11 由于所有输出神经元都是单个分区,等于相应分区的输入阻力,因此传输和输入阻力由神经元计算。 Mini-batch 培训是深度学习中一种典型的方法,可实现更高的预测准确度和加速融合。DeepDendrite 还支持 mini-batch 培训。 巴克,我们做 HPC-Net 的批次副本. 在训练过程中,每份副本都从批次中提供不同的训练样本。DeepDendrite 首先单独计算每份副本的重量更新。在当前训练批次中的所有副本完成后,平均重量更新被计算,所有副本中的重量被更新为相同数量。 N N 通过 HPC-Net 对抗敌对攻击的可靠性 To demonstrate the robustness of HPC-Net, we tested its prediction accuracy on adversarial samples and compared it with an analogous ANN (one with the same 784-64-10 structure and ReLU activation, for fair comparison in our HPC-Net each input neuron only made one synaptic connection to each hidden neuron). We first trained HPC-Net and ANN with the original training set (original clean images). Then we added adversarial noise to the test set and measured their prediction accuracy on the noisy test set. We used the Foolbox , 通过FGSM方法产生对抗噪音 ANN 与 PyTorch 一起训练 ,并通过我们的DeepDendrite进行HPC-Net培训,为了公平,我们在一个显著不同的网络模型上产生了对手噪音,一个20层的ResNet 噪音水平从 0.02 到 0.2 之间,我们实验了两个典型的数据集,MNIST。 和时尚MNIST 结果显示,HPC-Net的预测准确度分别高于同类 ANN 的 19% 和 16.72%。 98 99 93 100 101 95 96 报告总结 有关研究设计的更多信息可在 链接到这篇文章 Nature Portfolio Reporting Summary 数据可用性 支持本研究结果的数据可在与本论文提供的论文,补充信息和源数据文件中找到,该论文的源代码和用于在图中复制结果的数据。 – 可用在 MNIST数据集可在 Fashion-MNIST数据集可在 是的 提供这篇论文。 3 6 https://github.com/pkuzyc/DeepDendrite http://yann.lecun.com/exdb/mnist https://github.com/zalandoresearch/fashion-mnist 来源数据 Code availability DeepDendrite的源代码以及用于复制Figs的模型和代码。 – 在此研究中,可在 . 3 6 https://github.com/pkuzyc/DeepDendrite 参考 McCulloch, W. S. 和Pitts, W. 神经活动中存在的思想的逻辑计算。 李康,Y,Bengio,Y,Hinton,G. 深度学习. 自然521,436–444(2015年)。 Poirazi, P., Brannon, T. 和 梅尔, B. W. 在模型 CA1 金字塔细胞中对子门槛合并的算术。 伦敦, M. & Häusser, M. Dendritic 计算. 年. 神经科学. 28, 503–532 (2005). Branco, T. & Häusser, M. 单个牙分支作为神经系统的基本功能单位。 斯图尔特,G. J. 和斯普鲁斯顿, N. 丹德里特集成: 60 年的进步. Nat. Neurosci. 18, 1713–1721 (2015). Poirazi, P. 和 Papoutsi, A. 用计算模型照亮丹德里特功能. Nat. Rev. Neurosci. 21, 303-321 (2020)。 Yuste, R. & Denk, W. Dendritic 脊柱作为神经元整合的基本功能单位. 自然 375, 682–684 (1995年)。 Engert, F. & Bonhoeffer, T. 与 hippocampal 长期 synaptic plasticity 相关的牙脊椎变化. 自然 399, 66-70 (1999) 。 朱斯特,R. 丹德里特脊柱和分布式电路. 神经元71,772–781 (2011). Yuste,R. 螺杆的电部位化. 纪录 神经科学. 36,429-449(2013年)。 拉尔,W. 分支树和摩托尼乌龙膜阻力 Exp. Neurol. 1, 491–527(1959年)。 Segev, I. 和 Rall, W. 计算研究的兴奋的牙脊柱. J. 神经物理. 60, 499-523 (1988年)。 Silver, D. et al. 通过深度神经网络和树木搜索来掌握走路的游戏. Nature 529, 484–489 (2016). 银,D. et al. 一个通用增强学习算法,掌握象棋, shogi,并通过自我游戏。 McCloskey, M. & Cohen, N. J. 连接主义网络中的灾难性干预:序列学习问题。 R. M. 连接主义网络中的灾难性遗忘 趋势 Cogn. Sci. 3, 128-135 (1999) Naud, R. & Sprekeler, H. Sparse爆炸优化信息传输在多重神经代码。 Sacramento, J., Costa, R. P., Bengio, Y. & Senn, W. Dendritic 皮质微电路接近背部传播算法. in 神经信息处理系统的进步 31 (NeurIPS 2018) (NeurIPS*,* 2018)。 Payeur, A., Guerguiev, J., Zenke, F., Richards, B. A. & Naud, R. Burst 依赖的 synaptic plasticity 可以协调在等级电路中的学习。 Bicknell, B. A. & Häusser, M. 利用非线性登德里特计算的协同学习规则 神经元109,4001–4017(2021年)。 Moldwin, T., Kalmenson, M. & Segev, I. 梯度集群:一个模型神经元,通过登德里特非线性,结构性塑性和梯度下降来解决分类任务。 Hodgkin, A. L. 和 Huxley, A. F. 膜电流的定量描述及其在神经中的导电和刺激的应用 J. Physiol. 117, 500-544 (1952年)。 Rall,W. 生理特性理论的丹德里特. Ann. N. Y. Acad. Sci. 96, 1071–1092(1962年)。 Hines, M. L. & Carnevale, N. T. 神经元模拟环境 神经元计算机 9 1179-1209(1997年)。 Bower, J. M. & Beeman, D. 在《创世记》的书:利用一般神经模拟系统探索现实神经模型(eds Bower, J. M. & Beeman, D.) 17–27 (斯普林格纽约,1998年)。 Hines, M. L., Eichner, H. & Schürmann, F. 神经元分裂在计算相关的并行网络模拟允许运行时间的扩展与处理器的两倍。 Hines, M. L., Markram, H. & Schürmann, F. 单个神经元的完全暗示的并行模拟 J. 计算神经科学 25, 439-448 (2008) 。 Ben-Shalom,R.,Liberman,G. & Korngreen,A.加速图形处理单元上的分区建模。 Tsuyuki, T., Yamamoto, Y. 和 Yamazaki, T. 图形处理单元上具有空间结构的神经元模型的有效的数值模拟. In Proc. 2016 神经信息处理国际会议 (eds Hirose894Akiraet al.) 279–285 (Springer International Publishing, 2016)。 Vooturi, D. T., Kothapalli, K. & Bhalla, U. S. 在GPU上的神经元模拟中平行Hines矩阵解析器. 百分比 IEEE高性能计算(HiPC)第24届国际会议388-397(IEEE,2017年)。 Huber, F. 有效的树解析器在GPU上的hines矩阵。 预印在https://arxiv.org/abs/1810.12742(2018)。 Korte,B. & Vygen,J.组合优化理论和算法6 edn(Springer,2018年)。 Gebali, F. 算法和并行计算(Wiley,2011年) Kumbhar, P. et al. CoreNEURON:为神经元模拟器的优化计算引擎。 Urbanczik, R. 和 Senn, W. 通过牙齿预测的 somatic spiking 学习. 神经元 81, 521-528 (2014). Ben-Shalom, R., Aviv, A., Razon, B. & Korngreen, A. 在图形处理器上使用并行基因算法优化离子通道模型。 马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯卡尼,马斯 McDougal, R. A. et al. 模型DB的二十年及以后:为神经科学的未来构建必要的建模工具 J. Comput. Neurosci. 42, 1-10 (2017). Migliore,M.,Messineo,L. & Ferrante,M. Dendritic Ih选择性地阻止了CA1金字塔神经元中未同步的远程输入的计时。 Hemond, P. et al. 金字塔细胞的不同类别显示相互排斥的射击模式在海马区 CA3b. 海马 18, 411-424 (2008) 。 Hay, E., Hill, S., Schürmann, F., Markram, H. 和 Segev, I. 模型新皮质层5b金字塔细胞捕捉广泛的牙和周期活性属性。 Masoli, S., Solinas, S. 和 D’Angelo, E. 在详细的purkinje 细胞模型中处理行动潜力揭示了轴心隔离的关键作用。 Lindroos, R. et al. Basal ganglia neuromodulation over multiple temporal and structural scales—Simulations of direct pathway MSNs investigate the rapid onset of dopaminergic effects and predict the role of Kv4.2. 前神经电路12、3(2018)。 Migliore,M. et al. Synaptic集群作为嗅觉灯泡的气味操作员。 NVIDIA. CUDA C++ 编程指南. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (2021). NVIDIA. CUDA C++ 最佳实践指南. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html (2021)。 Harnett, M. T., Makara, J. K., Spruston, N., Kath, W. L. 和 Magee, J. C. Dendritic 脊椎的synaptic 放大增强了输入合作性。 Nature 491, 599–602 (2012). Chiu, C. Q. et al. Dendritic 脊椎的 GABAergic 抑制的分裂化. 科学 340, 759–762 (2013). Tønnesen, J., Katona, G., Rózsa, B. & Nägerl, U. V. 脊椎颈部的塑性调节了 synapses 的分区化。 Nat. Neurosci. 17, 678–685 (2014). Eyal,G. et al. 人体皮质金字塔神经元:从脊椎到尖端通过模型。 Koch, C. 和 Zador, A. 丹德里特螺杆的功能:服从生物化学而不是电部位化的设备. J. Neurosci. 13, 413-422 (1993年)。 Koch, C. Dendritic Spines 在计算生物物理学(牛津大学出版社,1999年)。 Rapp, M., Yarom, Y. & Segev, I. 平行纤维背景活动对脑细胞皮肤细胞的电缆属性的影响 神经计算 4, 518-533 (1992年)。 Hines,M. 支离神经方程的有效计算. Int. J. 生物医学 计算 15, 69-76 (1984年)。 Nayebi, A. & Ganguli, S. 从生物学上启发保护深度网络免受敌对攻击。预印在https://arxiv.org/abs/1703.09202 (2017). Goddard, N. H. & Hood, G. 使用平行基因组的大规模模拟. 在《基因组:利用一般神经模拟系统探索现实神经模型》(Ed Bower James M. & Beeman David) 349-379 (斯普林格纽约,1998年)。 Migliore, M., Cannia, C., Lytton, W. W., Markram, H. & Hines, M. L. 与神经元的并行网络模拟 J. 计算神经科学 21, 119 (2006) 。 Lytton, W. W. et al. Simulation neurotechnologies for advancing brain research: parallelizing large networks in NEURON. , 2063–2090 (2016). Neural Comput. 28 Valero-Lara, P. et al. cuHinesBatch:在GPU的人类大脑项目上解决多个Hines系统. In Proc. 2017国际计算科学会议566-575(IEEE,2017)。 Akar, N. A. et al. Arbor — 现代高性能计算架构的神经网络模拟图书馆。 Ben-Shalom, R. et al. NeuroGPU: GPU上的加速多部位,生物物理细节神经元模拟 J. Neurosci. 方法366,109400(2022年)。 Rempe, M. J. & Chopp, D. L. 与分支结构上的神经活动相关的反应传播方程式的预测器纠正器算法 SIAM J. Sci. Comput. 28, 2139–2161 (2006) 。 Kozloski, J. 和瓦格纳, J. 超级可扩展的解决方案大规模神经组织模拟. 前面。 Jayant, K. et al. 使用量子点涂层 nanopipettes 使用dendritic 螺杆的靶向细胞内电压记录. Nat. Nanotechnol. 12, 335-342 (2017). Palmer, L. M. & Stuart, G. J. 膜潜在变化在行动潜力和突触输入期间的登德里特脊柱。 Nishiyama, J. 和 Yasuda, R. 脊椎结构塑性生物化学计算. 神经元 87, 63–75 (2015). Yuste, R. 和Bonhoeffer, T. 与长期的 synaptic塑性相关的登德里特脊椎的形态变化. Annu. Rev. Neurosci. 24, 1071-1089 (2001). Holtmaat, A. 和 Svoboda, K. 在哺乳动物大脑中,依赖经验的结构性 synaptic plasticity. Nat. Rev. Neurosci. 10, 647–658 (2009). Caroni, P., Donato, F. & Muller, D. 学习时的结构性塑性:调节和功能. Nat. Rev. Neurosci. 13, 478-490 (2012) 。 Keck, T. et al. 在成人视觉皮层的功能重组过程中神经元电路的大规模重组 Nat. Neurosci. 11, 1162 (2008) 。 Hofer, S. B., Mrsic-Flogel, T. D., Bonhoeffer, T. & Hübener, M. 经验在皮质电路中留下了持久的结构痕迹。 Trachtenberg, J. T. et al. 成人皮层中经验依赖的协同性塑性长期体内成像 自然420,788-794(2002)。 Marik, S. A., Yamahachi, H., McManus, J. N., Szabo, G. & Gilbert, C. D. 激发和抑制神经元在体感皮层的轴性动力学 PLoS Biol. 8, e1000395 (2010)。 Xu, T. et al. 快速形成和选择性稳定合并,以维持运动记忆。 Albarran, E., Raissi, A., Jáidar, O., Shatz, C. J. & Ding, J. B. 通过增加运动皮质中新形成的螺的稳定性来增强运动学习。 布兰科,T. & Häusser,M. 单皮质金字塔细胞丹底里的合成梯度。 神经元69,885-892(2011)。 主要,G,Larkum,M. E. 和 Schiller,J. 活性特性新皮质金字塔神经元。 Gidon, A. et al. Dendritic 作用潜力和计算在人类层 2/3 皮质神经元. 科学 367, 83-87 (2020)。 Doron, M., Chindemi, G., Muller, E., Markram, H. & Segev, I. 定时的合成抑制形状 NMDA 峰值,影响局部的牙处理和皮质神经元的全球 I / O 属性。 Du, K. et al. 细胞类型特定的抑制在垂直脊柱投射神经元的登底层潜力. Proc. Natl Acad. Sci. USA 114, E7612-E7621 (2017). 史密斯, S. L., 史密斯, I. T., 布兰科, T. & Häusser, M. Dendritic 尖点增强皮质神经元的刺激选择性在体内。 Nature 503, 115-120 (2013). Xu, N.-l et al. 在一个活跃的传感任务中,传感和运动输入的非线性登底集成。 Takahashi, N., Oertner, T. G., Hegemann, P. 和 Larkum, M. E. 活跃的皮质化调节感知. 科学 354, 1587–1590 (2016). 谢菲尔德, M. E. & Dombeck, D. A. 在丹德里特树木中,钙的过渡普及预测了场地属性。 Markram,H. et al. 重建和模拟新皮质微电路,细胞163,456-492(2015年)。 Billeh, Y. N. et al. 系统地将结构和功能数据整合到小鼠主要视觉皮层的多尺度模型中. Neuron 106, 388–403 (2020)。 J. Hjorth, J. 和其他人. 在硅微电路. Proc. Natl Acad. Sci. USA 117, 202000671 (2020)。 Guerguiev, J., Lillicrap, T. P. & Richards, B. A. Towards deep learning with segregated dendrites. , e22901 (2017). elife 6 Iyer, A. et al. 避免灾难:活跃的丹德里特可以在动态环境中实现多任务学习。 Jones, I. S. & Kording, K. P. 一个单一的神经元可以通过连续的计算在其树上解决有趣的机器学习问题? 神经计算33 1554-1571(2021年)。 Bird, A. D., Jedlicka, P. & Cuntz, H. Dendritic 正常化改善了少连接的人工神经网络中的学习。 Goodfellow, I. J., Shlens, J. & Szegedy, C. 在第三届国际学习代表大会(ICLR)(ICLR,2015年)中解释和利用对抗的例子。 Papernot, N., McDaniel, P. & Goodfellow, I. 在机器学习中的可转移性:从现象到使用对抗样本的黑盒攻击。预印到https://arxiv.org/abs/1605.07277 (2016). Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. 基于格拉迪恩的学习应用于文档识别。 Xiao, H., Rasul, K. & Vollgraf, R. 时尚-MNIST:用于基准机器学习算法的新图像数据集。预印在 http://arxiv.org/abs/1708.07747 (2017). Bartunov, S. et al. 评估生物动机深度学习算法和架构的可扩展性. 在神经信息处理系统的进步 31 (NeurIPS 2018) (NeurIPS, 2018)。 Rauber, J., Brendel, W. & Bethge, M. Foolbox:一个Python工具箱来衡量机器学习模型的强大性。 Rauber, J., Zimmermann, R., Bethge, M. & Brendel, W. Foolbox native: fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. , 2607 (2020). J. Open Source Softw. 5 Paszke, A. et al. PyTorch:一种强制性的风格,高性能的深度学习库. 在神经信息处理系统的进步32(NeurIPS 2019)(NeurIPS,2019)。 He, K., Zhang, X., Ren, S. & Sun, J. 深度残留学习图像识别. 在 2016 年 IEEE 会议计算机视觉和模式识别 (CVPR) 770-778 (IEEE, 2016 年) 。 认可 作者们衷心感谢Rita Zhang、Daochen Shi和NVIDIA的成员对GPU计算的宝贵技术支持。这项工作得到了中国国家重点研发计划(2020AAA0130400号)给K.D.和T.H.,中国国家自然科学基金会(61825101)给Y.T.,中国国家重点研发计划(2022ZD01163005)给L.M.,广州省重点区域研发计划(2018B030338001)给TH,中国国家自然科学基金会(61825101)给Y.T.,瑞典研究委员会(VR-M-2020-01652号),瑞典电子科学研究中心(SeRC号),欧盟 / 视野2020号(945539号)(HBP SGA3),和数字未来向J.H.K.,J.H.,PDIC和S 本文在 CC by 4.0 Deed (Attribution 4.0 International) 许可证下可用。 本文在 CC by 4.0 Deed (Attribution 4.0 International) 许可证下可用。