څیړونکي د GPU انجن جوړ کړي چې د دماغ سیلونو 1500x چټک شي

د نویسنده: د چين د Zhang Gan He د ما Xiaofei Liu د J. J. Johannes Hjorth Alexander Kozlov یوټایډ او Shenjian Zhang Jeanette Hellgren Kotaleski د چين د Tian د پیاوړتیا کله چې د چين د Huang د نویسنده: د چين د Zhang ګان او د ما د شانګهای د د J. J. Johannes Hjorth الکساندر Kozlov یوټایډ او د شانګهای Zhang جینټ Hellgren Kotaleski د چين د Tian د پیاوړتیا کله چې د چين د Huang د Abstract biophysically تفصيلي څو برخې ماډلونه د دماغ د محاسباتو اصولو څیړنې لپاره قوي وسایلو دي او همدارنګه د مصنوعي انټرنټ (AI) سیسټمونو لپاره algorithms جوړولو لپاره د نظریاتي چارجر په توګه کار کوي. په هرصورت، د ارزانه محاسباتو لګښت په سخت ډول د عصبي علومو او د انټرنټ په سيمو کې غوښتنلیکونه محدود کوي. د تفصيلي برخې ماډلونو د نمونې په وخت کې د لوی سیسټمونو د کرښو حل کولو وړتیا دی. دلته، موږ د نوي کتاب وړاندې کوو د endritic ژرنده Cheduling (DHS) پروسه چې د دې پروسه په ځانګړې توګه چټک کړي. موږ په نظریاتو کې ثابت کوو چې د DHS اپلوز د کمپیوټرونو په اړه غوره او دقیق دی. دا GPU پر بنسټ پروسه د کلاسیک لړۍ Hines پروسه په پرتله د 2-3 کچه لوړ سرعت سره کار کوي. موږ د DeepDendrite فریم جوړ کوو، کوم چې د DHS پروسه او د NEURON Simulator GPU کمپیوټر انجن سره یوځای کوي او د عصبي علومو کارونو کې د DeepDendrite غوښتنلیکونه ډیزاین کوي. موږ څرنګه کوو چې څرنګه چې د سپین انډیټونو فضاخیزه نمونې په 25،000 سپینونو سره د بشري پیرامیډل نیورون ماډل کې د عصبي جذب اغیزمن D H S د نندارتون د نیورونونو کوډ کولو او محاسبه کولو اصولونه د عصبي علومو لپاره اړین دي. د پیاوړې دماغونه د ډیرو هزاران مختلف ډولونو نیورونونو څخه جوړ شوي دي چې د ځانګړي مورفولوژیکي او بیو فیزیکی ځانګړتیاوو لري. که څه هم دا د مفهوم په اړه د واقعیت نه ده، د "پایټ نیورون" نظریه ، چې په دې کې د نوريانو په توګه د ساده شمیره واحدونو په نظر شوي دي، اوس هم په پراخه کچه په عصري کمپیوټرونو کې، په ځانګړي ډول په عصري شبکې تحلیل کې کارول کیږي. په وروستیو کلونو کې، د عصري هوښيار (AI) دا اصول کارول او قوي وسایلو، لکه عصري شبکې (ANN) پراختیا کړ. په هرصورت، په انفرادي عصري کچه د جامع محاسبات په پرتله، د subcellular پارامترونو، لکه عصري dendrites، هم کولای شي nonlinear عملیاتونه په توګه مستقل محاسبات واحدونه ترسره کړي. , , , , نور، dendritic spines، کوچني پړاوونه چې د dendrites په spiny neurons غليظ پوښښ، کولای شي د synaptic سیگنالونو compartmentalizing، چې دوی اجازه ورکوي چې د دوی د والدین dendrites ex vivo او in vivo جلا شي. , , , . 1 2 3 4 5 6 7 8 9 10 11 Simulations using biologically detailed neurons provide a theoretical framework for linking biological details to computational principles. د biophysically detailed multi-compartment model framework core , موږ ته اجازه ورکوي چې په واقعي dendritic morphologies، داخلي ionic conductance، او extrinsic synaptic input سره عصريونو ماډل کړي. د تفصيلات multi-compartment ماډل، چې د dendrites، د کلاسيک کیبل نظریه پر بنسټ دی. ، چې د dendrites biophysical membrane ځانګړتیاوو په توګه passive کیبلونه نمونې کوي، د برېښنايي سیگنالونو په څیر په پیچلي عصري پروسسونو کې د اغیزمنې او پراختیا لپاره د ریاضیي شرح وړاندې کوي. له خوا د کیبل نظریاتو سره فعال بیيوفیزیکي ميخانيزونه لکه ایونونو چینلونه، هیټاتور او inhibitory synaptic currents، او داسې نور شامل کول، يو تفصيلي څو برخې ماډل کولی شي د سیلولر او subcellular عصري محاسبات د تجربې محدودیتونو له لارې ترلاسه کړي. , . 12 13 12 4 7 د عصبي علومو په اړه د عمیق اغېز په پرتله، د بیولوژیکي تفصيلات عصري ماډلونه اخیراً د عصري جوړښت او بیولوژیکي تفصيلاتو او AI تر منځ د غفلت پلټنې لپاره کارول شوي دي. د عصري علومو په برخه کې د عصري علومو په برخه کې د ANNs دي چې د نقطې عصري شبکې څخه جوړ شوي دي، د بیولوژیکي عصري شبکې په پرتله. که څه هم د "backpropagation-of-error" (backprop) ارګرام سره ANNs په ځانګړي غوښتنلیکونو کې ناقانونه کړنو ترلاسه کوي، حتی د Go او شطرنج لوبغاړي کې ترټولو غوره انسانی مسلکي لوبغاړي. , ، د انسان دماغ د ANNs په ډومینونو کې چې د ډینامیک او شوروي چاپیریالونو سره تړاو لري، نور کار کوي. , د وروستیو نظریاتي مطالعې ښیي چې د ډینډریټیک انټرنټ د اغیزمن زده کړې algorithms تولید کې مهم دی چې ممکن د پیژندل شوي معلوماتو د پروسس کې د backprop څخه زیات وي. , , نور، یو واحد تفصيلي څو برخې ماډل کولای شي د ټیټ نوريونونو لپاره د شبکې په کچه غیر لاینر محاسبات زده کړي له خوا یوازې د synaptic قوت تنظیم کول. , په دې توګه، دا د دماغ په څیر AI پارډیمګونو پراختیا لپاره لوړ ترټولو مهم دی، د واحد تفصيلات نوريون ماډل څخه د عمده بیولوژيکي تفصيلات شبکې ته. 14 15 16 17 18 19 20 21 22 د تفصيلي نمونې لارښوونې یو اوږدې مودې چمتو شوي چمتو کول په خپل غیرقانوني لوړ محاسباتي لګښت کې وي، کوم چې د عصبي علومو او AI ته د دې غوښتنلیک په سخت ډول محدود شوی. د نمونې اصلي بوتل د تفصيلي ماډل کولو بنسټیز نظریاتو پر بنسټ لخوا لګولو ده. , , د اغیزمنتیا د ښه کولو لپاره، د کلاسیک Hines پروسه د O(n3) څخه O(n) ته د مساوي حل لپاره د وخت پیچیدو کموي، کوم چې په پراخه کچه د مشهور سیمیولیټرونو لکه NEURON کې د کور انډولیتم په توګه کارول کیږي. د Genesis . په هرصورت، دا روش د هر کمرې په ترتیب کې د پروسس لپاره د لړ لارښوونې کاروي. کله چې د نمونې سره د dendritic spines ډیری biophysically تفصيلي dendrites شامل دي، د لنډي مساوي مټریکس ("Hines Matrix") سره د dendrites یا spines زیاتوالي شمیره اندازه کوي (د انځور. )، د هینس پروژې د عملیاتو نه کوي، ځکه چې دا د ټول سمیلیشن په اړه د ډیری وزن ورکوي. 12 23 24 25 26 1 او یو rekonstructed layer-5 پیراميډال نوريون ماډل او د متحرک فورمول کارول سره تفصيلات نوريون ماډلونه. د کار د جریان کله چې په شمولیت کې د تفصيلي نوريون ماډلونو نمونې. د مساوي حل مرحله د نمونې کې د بوتل بوتل دی. د نمونې کې د لنډي مساوي مثال. د Hines method data dependency کله چې په linear equations حل د د Hines مټریکس اندازه سره د نمونوي پیچیدو کچه. د linear equation system د حل لپاره د تعداد په عمده توګه زیاتوي کله چې ماډلونه ډیر تفصیلي وده کوي. د مختلفو ډولونو د نوريون ماډلونو په اړه د Hines لړۍ پروسه حسابولو لګښت (د مساوي حل مرحله کې ترسره شوي اقدامات). د حل کولو مختلفو طریقو انځورونه. د نوريون مختلفو برخو په دوامداره طریقو کې د ډیرو پروسس یونټونو ته تادیه شوي دي (د منځني، د حق)، په مختلفو رنګونو کې ښيي. په لړۍ طریقو کې (د چپ) ، ټول کمرې د یو واحد سره حساب شوي دي. د 3 روشونو کمپیوټري لګښت کله چې د سپین سره د پیرامیډل ماډل مساوي حل کړي. د 500 پیرامیډال ماډلونو لپاره د پیرامیډال ماډلونو د حل په اړه د مختلفو طریقو د چلولو وخت. د چلولو وخت د 1s نمونې (د 40،000 وخت سره د 0.025 ms په وخت کې د نمونې حل) د وخت لګښت ښیي. p-Hines paralel method in CoreNEURON (on GPU), Branch-based branch-based parallel method (on GPU), DHS Dendritic hierarchical scheduling method (on GPU). a b c d c e f g h g i په وروستیو کلونو کې، لوی پرمختګونه شتون لري چې د هینس طریقې په کارولو سره د سلول په کچه parallel methods، چې په هر سلول کې د مختلفو برخو د محاسبه paralelize اجازه ورکوي. , , , , , په هرصورت، د اوسني سلول په کچه paralel methods اغیزمن paralelization ستراتیژۍ لګښت یا په پرتله د اصل Hines سټراژۍ لګښت لري. 27 28 29 30 31 32 دلته، موږ په بشپړه توګه اتومات، شمیره دقیق، او ګټور Simulation وسیله چې کولی شي د حسابولو اغیزمنتیا په عمده توګه چټک کړي او د حسابولو لګښت کم کړي. برسېره پر دې، دا Simulation وسیله کولی شي په سمه توګه د ماشین زده کړې او AI غوښتنلیکونو لپاره د بیولوژیکي معلوماتو سره د عصبي شبکې جوړولو او ازموينه لپاره وکارول شي. په مهمه توګه، موږ د Hines پروګرام سره د دوامداره محاسبه په توګه د ریاضیاتو برنامه کولو ستونزه جوړ کړي او د Dendritic Hierarchical Scheduling (DHS) پروګرام جوړ کړي چې د ترکیبونو غوره کولو په اساس دی. د متوازن کمپیوټر نظریه . موږ ښيي چې زموږ algorithm وړاندیز کوي چې د ګمرکولو په بشپړ ډول وړاندیز نه کوي. برسېره پر دې، موږ د DHS لپاره د اوس مهال تر ټولو پرمختللي GPU چڼاسکه لپاره د GPU حافظه ژیراریک او د حافظه لاس رسی میکانیزمونو څخه ګټه ورکړي. په ګډه، DHS کولی شي د محاسبې 60-1,500 ځله (د اضافي جدول) چټک کړي. ) په پرتله د کلاسیک Simulator NEURON په داسې حال کې چې د ورته دقت د ساتلو. 33 34 1 25 د AI کې د کارولو لپاره تفصيلي dendritic simulations وړاندیز کولو لپاره، موږ د DHS-embedded CoreNEURON (د NEURON لپاره ګټور کمپیوټر انجن) پلیټ فارم په ګډه د DeepDendrite چارجر جوړ کړ. لکه څنګه چې د نمونې انجن او دوه مسلکي ماډلونه (I / O ماډل او زده کړې ماډل) د نمونې په وخت کې د dendritic زده کړې algorithms ملاتړ کوي. DeepDendrite د GPU هارډویر پلیټ فارم پر چلول کیږي، په عصري علومو کې د منظم نمونې کارونو او د AI کې د زده کړې کارونو په ګډه ملاتړ کوي. 35 Last but not least، موږ د DeepDendrite کارولو په کارولو سره ډیری غوښتنلیکونه هم وړاندې کوو، چې د عصبي علومو او AI په اړه ځینې مهم ستونزو ته اړتیا لري: (1) موږ ډیزاین کوو چې د dendritic spine input spatial patterns د عصري فعالیتونو سره چې د dendritic tree (full-spine models) په اوږدو کې spines لري د عصري فعالیتونو سره د عصري فعالیتونو اغیزې کوي. DeepDendrite موږ ته اجازه ورکوي چې په د ~ 25،000 dendritic spines سره د انسان د پیراميډل عصري ماډل کې د عصري محاسبه څیړنه کړي. (2) په بحث کې موږ هم د DeepDendrite د عصري فعالیتونو سره د AI په تړاو کې، په ځانګړي توګه، د morphologically تفصيل د DeepDendrite لپاره ټول سرچینه کوډ، د بشپړ سپین ماډلونه او تفصيلات ډینډریټیک نیټیټ ماډل په آنلاین کې د اعلانونو لپاره شتون لري (د کوډ وړتیا وګورئ). زموږ د عامه سرچینه زده کړې فریکونسۍ کولی شي په اسانۍ سره د نورو ډینډریټیک زده کړې قواعدو سره، لکه د غیر لونیری (د بشپړ فعال) ډینډریټونه لپاره زده کولو قواعدو سره انټرنټ شي. د انفرادي بستې synaptic plasticity ، او د spike prediction سره زده کړې په ټولیزه توګه، زموږ څیړنه د وسایلو بشپړ ټولګه وړاندې کوي چې امکان لري چې اوسني کمپیوټري نیوروسسینس ټولنه ایزوسیسټم بدلون وکړي. د GPU کمپیوټریټ د قدرت څخه ګټه واخلئ، موږ تصور کوو چې دا وسایلو به د دماغ د ټیټ جوړښتونو د کمپیوټریټیک اصولو په سیسټم کچه د څیړنې ته وده ورکړي، او همدارنګه د نیوروسسینس او عصري AI ترمنځ تبادلې ته وده ورکړي. 21 20 36 پایلې د Dendritic Hierarchical Scheduling (DHS) پروسه د ایونیک اوسپنې محاسبه کول او لاینیک مساوات حل کول د بیيوفیزیکي تفصيلات نیورونونو محاسبه کولو کې دوه مهم مرحلې دي، کوم چې وخت لګښت لري او سخت محاسباتي بارونه کوي. خوشحاله، د هر کمرې د ایونیک اوسپنې محاسبه کول په بشپړه توګه مستقل پروسه ده نو دا په طبيعي ډول په ډیزاینونو کې باوريال شي لکه GPUs په څیر د ډیزاینونو سره سمبال شوی په پایله کې، د لنډي مساوي حل به د paralelization پروسه لپاره د بوتل بوتل شي (د انځور. همدارنګه 37 1 د F د دې بوتل د حل لپاره، د سلول په کچه paralel روشونه پراختیا شوي دي، چې د واحد سلول حسابولو په چټکۍ سره د یو واحد سلول په ډیرو کڅوړې کې چې په paralel کڅوړه کیدی شي "کښته" کړي. , , په هرصورت، د دې روشونو په عمده توګه د مخکښ معلوماتو پر بنسټ دي چې د یو واحد نوريون په څرنګه کې څرنګه جوړ کړي چې څنګه په مخکښو کې وده ورکړي (د انځور. ; د اضافي Fig. ). په دې توګه، دا د غیر متوازن morphologies د نوريونونو لپاره لږ اغیزمن شي، د مثال په توګه، پیراميډل نوريونونه او Purkinje نوريونونه. 27 28 38 1 ګرامه 1 موږ هڅه کوو چې د بیولوژيکي تفصيلي عصري شبکې د نمونې لپاره یو اغیزمن او دقیق متوازن لاره وده ورکړي. لومړی، موږ د سلول په کچه متوازن لاره د دقت لپاره معیارونه جوړ کوو. د متوازن کمپیوټین په نظریاتو پر بنسټ ، موږ درې شرایط وړاندې کوو ترڅو ډاډ ترلاسه کړي چې يو متوازن روش به د Hines روش په توګه د Hines روش کې د معلوماتو د بستې په اساس په ورته حلونه ورکوي (د Methods وګورئ). بيا په نظريه کې د چلند وخت، د serials او متوازن کمپیوټرونه اغیزمنې ارزښت کړي، موږ د محاسباتي لګښت په مفهوم کې د اندازې په حل کې د ګامونو شمېر په توګه د محاسبات جوړ کړي (د Methods وګورئ). 34 د نمونوي دقت او محاسبه لګښت پر بنسټ، موږ د paralelization ستونزه په توګه د ریاضیاتو پروګرام ستونزه (د Methods وګورئ). په ساده کلمو کې، موږ د یو واحد نوريون په توګه د ډیری نانډونو (پارتونونو) په څیر وګورئ. parallel threads، موږ کولی شو په لوړه کچه په هر مرحله کې نښانونه لري، مګر موږ باید ډاډه وکړئ چې یو نښان یوازې په داسې حال کې محاسبه کیږي چې د هغې د ماشومانو نښانونه ټول پروسس شوي دي؛ زموږ هدف دا ده چې د کلې عمل لپاره لږ تر لږه کچه مراحل سره یو ستراتیژۍ پیدا کړي. k k د غوره برخه د توليدولو لپاره، موږ د Dendritic Hierarchical Scheduling (DHS) د پروژې وړاندیز کوو (د پروژې کې د تخنیکي ثبوت ښودل کیږي). د DHS اصلي مفکوره ده چې عمیق نښانونه ترټولو غوره کړي (د انځور. د DHS موډل په دوو مرحلهونو کې شامل دي: د ډینډریټیک تپولو تحلیل او غوره تبادلې ته ونیسئ: (1) د تفصيلي ماډل له مخې، موږ لومړی د هغه اړونده بستې ټری ترلاسه کوو او په ټری کې د هر نښلیدو ژوره محاسبه کوو (د یو نښلیدو ژوره د هغې د خدای نښلیدو شمېر دی) (د انځور. ). (2) د تپولوژیک تحلیل وروسته، موږ د کانډاډاټونو په لټه کې ونیسئ او تر ټولو غوره کړئ په پایله کې د کانټینټ نښانونه (د نښان یوازې په صورت کې یو کانټینټ دی که د هغې د ماشومانو نښانونه ټول پروسس شوي دي). دا پروسه تکرار کیږي مګر د ټولو نښانونه پروسس شوي دي (د انځور. همدارنګه د 2A 2B، C k د 2D د DHS کار د جریان. د DHS پروسه په هر iteration کې تر ټولو عالي کانټینټ نندارې. د پارامترال ماډل د نښی د محاسبه کولو انځور. د ماډل په لومړي ډول په یوه لرګيو جوړښت کې بدل کیږي، بیا د هر نښی په عمده توګه محاسبه کیږي. رنګونه د مختلف عمده ارزښتونو ته اشاره کوي. د مختلف نوريون ماډلونو په اړه تپولو تحلیل. د مختلف مورفولوژیکونو سره د شش نوريون دلته ښيي. د هر ماډل لپاره، د سومایټ په توګه د درخت د ریښو په توګه انتخاب شوي دي نو د نانډ په عمده توګه د سومایټ (0) څخه د ډیټال ډینډریټونو ته زیات کیږي. د DHS په ماډل کې ترسره کولو انځورونه سره چار ټریډونه. کانډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډاډ د پروسس نښانونه: نښانونه چې مخکې پروسس شوي دي. د DHS له خوا د پروسس وروسته ترلاسه شوي paralelization استراتژی DHS له 14 څخه 5 ته د لړ نندارې پروسس ګامونه کموي د نندارې ته څو نندارې ته توزیع کول. د نسبتي لګښت، په دې توګه، د DHS د محاسبه لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت د لګښت. a k b c d b k e d f لکه څنګه چې د 15 کمپرسورونو سره یو ساده ماډل لرو، د سیریز کمپرسور Hines پروسه کارولو سره، دا د ټولو نانو پروسس لپاره د 14 ګامونو ته اړتيا لري، په داسې حال کې چې د DHS سره د چار paralel واحدونو سره کولی شي د خپل نانو په پنجو بیس سیټونو کې برخه واخلي. ): {{9,10,12,14}، {1,7,11,13}، {2,3,4,8}، {6}، {5}}. ځکه چې په ورته برخو کې د نښلیدو په دوامداره توګه پروسس شي، دا یوازې د DHS په کارولو سره د ټولو نښلیدو پروسس کولو لپاره د پنج ګامونو ته اړتیا لري. ). د 2D 2 او بل، موږ د DHS پروسه په شواهدو تفصيل نوريون ماډلونه (د ModelDB څخه انتخاب شوي) ) سره د مختلفو ټریډونو شمېر (Fig. ):، په شمول د cortical او hippocampal pyramidal نوريون , , cerebellar Purkinje نوريون Striatal پروژې نوريون (SPN) )، او د لامبو لامبو mitral سلولونه ، په حساس، cortical او subcortical سيمو کې د اصلي نوريونونو پوښښ. موږ وروسته د محاسبې لګښت اندازه کړ. په دې کې د محاسبې لګښت د DHS د محاسبې لګښت د seri Hines روش د لګښت لګښت لګښت لګښت لګښت دی. د محاسبې لګښت، دا دی، د مساوي حل کې ترسره کچه کچه، سره د نوري شمیره کچه زیاتوالی په شان کم شوی. د بیلګې په توګه، د 16 نوريونو سره، د DHS د محاسبې لګښت د seri Hines روش په پرتله 7٪-10٪ دی. حیرانتیا سره، د DHS روش د 16 یا حتی 8 متوازن نوريونو لپاره د دې محاسبې لګښت لاندې لګښتونو ته ورسیږي (د انځور. ) ، د اضافي ټریډونو ته اشاره کوي چې د ټریډونو په منځ کې د تقاضا له امله د کړنو نور ښه نه کوي. 39 د 2F 40 41 42 43 44 45 د 2F په ګډه، موږ د DHS موډل جوړ کړي چې د dendritic تپولو او د دوامداره کمپیوټرونو لپاره د ترټولو غوره برخې اتومات تجزیه کوي. دا مهمه ده چې DHS د نمونې پیل مخکې د ترټولو غوره برخې راټولوي، او د مساوي حل لپاره د اضافي محاسبه اړتیا نلري. د GPU حافظه boosting لخوا DHS چټک کول DHS هر نوريون سره ډیری ټریډونه محاسبه کوي، کوم چې د نوري شبکې نمونې چلولو په وخت کې ډیری ټریډونه مصرف کوي. Graphics Processing Units (GPUs) د عمده پروسس یوټیټونو څخه جوړ شوي دي (د دې امله، د سټراییم پروسیسرونه، SPs، انځور. ) د متوازن کمپیوټر لپاره په نظریاتو کې، د GPU ډیری SPs باید د لوی کچه عصري شبکې لپاره اغیزمن نمونې ملاتړ وکړي (د انځور. ). په هرصورت، موږ په دوامداره توګه وګورئ چې د DHS اغیزمنتیا په عمده توګه کم شوی کله چې د شبکې اندازه زیات شوی، کوم چې ممکن د ډاټا ذخیره کولو یا اضافي حافظه لاس رسی له امله د منځني پايلې لوډولو او لیکلو له امله وي. او د چپ). 3A، B 46 3c د 3D د GPU آرشیفیکټ او د دې د حافظه هیراریچ. هر GPU د پروسس یوټیټونه (د جریان پروسیسرونه) لري. د مختلفو ډولونه د حافظه د مختلفو رسولو لري. Streaming Multiprocessors (SMs) آرشیفیت. هر SM د ډیرو streaming پروسیسرونه، ریکارډونه، او L1 کیش لري. د DHS په دوو نوريونونو، هر سره د چار نښی په غوښتنلیک. د نمونې په وخت کې، هر نښی په یو جریان پروسسټر کې ترسره کیږي. د GPU پر د یادښت د ښه کولو ستراتیژۍ. د DHS، مخکې (د چپ) او وروسته (د چپ) د یادښت د ښه کولو مخکې، ټیټ پینل، ټیټ توزیع او ډاټا ذخیره کول. په پایله کې، د دوو نیورونونو په انډول کولو کې د درې ګامونو کې یو مثال پروسسرونه د معلوماتو غوښتنلیک ته د ګرځنده یادښت څخه د هر ټریډ لپاره د معلوماتو لوستل شي. د یادښت د لوستل کولو (د چپ) پرته، د ټولو غوښتنلیک ډاټا لوستل کولو لپاره د ساتې سوداګریزونو ته اړتيا لري او د منځني پایلو لپاره ځینې اضافي سوداګریزونو ته اړتيا لري. د یادښت د لوستل کولو (د چپ) سره، دا یوازې د ټولو غوښتنلیک ډاټا لوستل کولو لپاره د دوو سوداګریزونو ته اړتيا لري، ریکارډونه د منځني پایلو لپاره کارول کیږي، کوم چې نور د یادښت د رسولو ښه کوي. د DHS (32 ټریډونه هر سلګونه) سره او پرته د ډیرو کټګورۍ 5 پیرامیډال ماډلونو سره د سپینونو سره د حافظه وده ورکړي. د ډیری layer 5 پیرامیډال ماډلونه سره spines. Memory boosting 1.6-2 ځله speedup راځي. a b c d d e f We solve this problem by GPU memory boosting, a method to increase memory throughput by leveraging GPU’s memory hierarchy and access mechanism. Based on the memory loading mechanism of GPU, successive threads loading aligned and successively-stored data lead to a high memory throughput compared to accessing scatter-stored data, which reduces memory throughput , . د لوړ تفتیش لپاره، موږ لومړی د نښانونو د محاسبه امرونه رامینځته کوو او د نښانونو شمېر په اساس تفتیش کوو. بيا موږ د ډیټابیس امر سره مطابقت سره د نړیوال یادښت کې د ډاټا ذخیره تفتیش تفتیش کوو، د دې معنی چې د هغه نښانونه چې په ورته مرحله کې پروسس شوي دي په کلني یادښت کې تصدیق شوي دي. برسېره پر دې، موږ د GPU ریکارډونه کاروي چې منځني پايلې ذخیره کوو، نور د یادښت تفتیش تفتیش تفتیش کوو. د مثال ښکاري چې د یادښت تفتیش یوازې د دوو یادښت تبادلې ته اړتيا لري چې د هشتم درخواست ډاټا لوستل شي (د انځور. نور، د سپینونو سره د پیراميډال نوريون د څو شمېر تجربو او د ځانګړي نوريون ماډلونه (Fig. ; د اضافي Fig. ) ښيي چې د حافظه د لوړولو په پرتله 1.2-3.8 ځله د DHS په پرتله چټک کړي. 46 47 د 3D د 3D، F 2 د GPU حافظه boost سره د DHS فعالیت په پراخه کچه ازموینه کولو لپاره، موږ د شواهدو نوريون ماډلونه غوره کوو او د هر موډل د لوی شمېرونو په اړه د کیبل مساوات حل کولو د چلند وختونو (د انځور. ). موږ د هر نوريون لپاره د DHS سره د چار ټریډونو (DHS-4) او د شتمنو ټریډونو (DHS-16) څارنه وکړه. په CoreNEURON کې د GPU طریقې په پرتله، DHS-4 او DHS-16 کولی شي په اړه د 5 او 15 ځله په پرتله چټک کړي. نور، په پرتله د NEURON کې د معمولي لړۍ Hines لاره سره د CPU د واحد ټریډ سره چلولو سره، DHS د نمونې په 2-3 نظمونو کې چټکوي (د اضافي انځور. ) ، په داسې حال کې چې په ټیټ سپینونو شتون کې ورته شمېره شمیره دقت لري (د اضافي فګونه. او ), فعال dendrites (د اضافي Fig. ) او مختلف segmentation ستراتیژۍ (د اضافي انځور. همدارنګه 4 د 4A 3 4 8 7 7 د GPU (dt = 0.025 ms، په مجموعي توګه 40،000 iterations) لپاره د 1s نمونې لپاره د مساوي حل کولو چلند وخت. CoreNEURON: په CoreNEURON کې کارول شوي متوازن روش؛ DHS-4: DHS د هر نوريون لپاره د چار نوريانو سره؛ DHS-16: DHS د هر نوريون لپاره د 16 نوريانو سره. د د DHS-4 او DHS-16 له خوا د پارتیشن ویشولو، هر رنګ د یو واحد ټریټ راځي. په حسابولو کې، هر ټریټ د مختلفو ټریټونو ترمنځ بدل کیږي. a b c DHS د سلول ډول ځانګړي غوره برخه جوړوي د DHS موډل د کار میکانیزم په اړه د معلوماتو ترلاسه کولو لپاره، موږ د پارتیشن کولو پروسه د هر ټریډونو کې د پارتیشنونو په کارولو سره ویزول شوي دي (په انځور کې هر رنګ یو واحد ټریډ وړاندې کوي. ). د ویزولو ښيي چې یو واحد نښی په اغیزمنه توګه د مختلف شاوخوا ترمنځ بدل کیږي (د انځور. Interestingly, DHS په morphologically symmetric neurons لکه د striatal پروژې نوريون (SPN) او د Mitral سلول (Fig. ). په پرتله، دا د morphologically غیر متوازن نوريونونو لکه د پیراميډل نوريون او Purkinje سلولونو fragmented partitions generates (Fig. ), د دې اشارې چې DHS په انفرادي کڅوړه کڅوړه کڅوړه (یا د tree node) په پرتله د ګاڼو کڅوړې کڅوړه. دا د سلول ډول په ځانګړې توګه د نوري کڅوړه کڅوړه اجازه ورکوي چې DHS په بشپړه توګه د ټولو وړاندیز شوي ټریډونو څخه کار واخلئ. 4B، C 4B، C 4B، C 4B، C په خلاصه توګه، DHS او د حافظه وده ورکولو په نظریاتو کې د لنډیزو مساواتونو حل لپاره د بیلابیلو حل لپاره د بیلابیلو اغیزمنۍ سره د بیلابیلو اغیزمنۍ لپاره یو غوره حل جوړوي. د دې اصول په کارولو سره، موږ د Open-Access DeepDendrite پلیټ فارم جوړ کړ، کوم چې د عصري علومو لخوا کارول کیدی شي ترڅو د GPU پروګرام کولو په ځانګړي ډول پوهې پرته د ماډلونو په کارولو کې د DeepDendrite په کارولو کې څنګه کارول شي. موږ هم په بحث برخه کې د AI سره تړاو شوي کارونو لپاره د DeepDendrite فریم ورکشاپ کې د امکاناتو په اړه بحث کوو. DHS اجازه ورکوي چې spine-level ماډل As dendritic spines receive most of the excitatory input to cortical and hippocampal pyramidal neurons, striatal projection neurons, etc., their morphologies and plasticity are crucial for regulating neuronal excitability , , , , . په هرصورت، spines ډیر کوچني دي ( ~ 1 μm اوږدوالی) چې په مستقیم ډول په تجربو کې د تشنج بستې پروسهونو په اړه اندازه شي. په دې توګه، تخنیکي کار د spine calculations بشپړ درک لپاره مهم دی. 10 48 49 50 51 موږ کولی شو د دوو کمپرسورونو سره یو واحد پړاو ماډل کړي: د پړاو سر چې synapses موقعیت لري او د پړاو کڅوړه چې د پړاو سر سره dendrites اړیکه کوي. تئورې پیژندل کیږي چې د ډیری ټیټ کچې (0.1-0.5 um په قطر کې) په بریښنايي ډول د کچې سر څخه د هغې د والدین ډینډریټ څخه تخصیص کوي، په دې توګه د کچې سر کې تولید شوي سیگنالونه کچه کوي. . However, the detailed model with fully distributed spines on dendrites (“full-spine model”) is computationally very expensive. A common compromising solution is to modify the capacitance and resistance of the membrane by a د سپین فورمه ، په دې ځای کې چې د ټولو سپینونو په ځانګړې توګه جوړ کړي. دلته، د spine فابريکه هدف د سلګون د سلګون biophysical ځانګړتیاوو په اړه د سلګون اغیزه تثبيت . 52 53 F 54 F 54 Inspired by the previous work of Eyal et al. , we investigated how different spatial patterns of excitatory inputs formed on dendritic spines shape neuronal activities in a human pyramidal neuron model with explicitly modeled spines (Fig. ). Noticeably, Eyal et al. employed the spine factor to incorporate spines into dendrites while only a few activated spines were explicitly attached to dendrites (“few-spine model” in Fig. ). The value of spine in their model was computed from the dendritic area and spine area in the reconstructed data. Accordingly, we calculated the spine density from their reconstructed data to make our full-spine model more consistent with Eyal’s few-spine model. With the spine density set to 1.3 μm-1, the pyramidal neuron model contained about 25,000 spines without altering the model’s original morphological and biophysical properties. Further, we repeated the previous experiment protocols with both full-spine and few-spine models. We use the same synaptic input as in Eyal’s work but attach extra background noise to each sample. By comparing the somatic traces (Fig. ) and spike probability (Fig. ) in full-spine and few-spine models, we found that the full-spine model is much leakier than the few-spine model. In addition, the spike probability triggered by the activation of clustered spines appeared to be more nonlinear in the full-spine model (the solid blue line in Fig. ) than in the few-spine model (the dashed blue line in Fig. ). These results indicate that the conventional F-factor method may underestimate the impact of dense spine on the computations of dendritic excitability and nonlinearity. 51 5a F 5a F 5b, c 5d 5d 5d Experiment setup. We examine two major types of models: few-spine models and full-spine models. Few-spine models (two on the left) are the models that incorporated spine area globally into dendrites and only attach individual spines together with activated synapses. In full-spine models (two on the right), all spines are explicitly attached over whole dendrites. We explore the effects of clustered and randomly distributed synaptic inputs on the few-spine models and the full-spine models, respectively. Somatic voltages recorded for cases in . Colors of the voltage curves correspond to , scale bar: 20 ms, 20 mV. Color-coded voltages during the simulation in at specific times. Colors indicate the magnitude of voltage. Somatic spike probability as a function of the number of simultaneously activated synapses (as in Eyal et al.’s work) for four cases in . Background noise is attached. Run time of experiments in with different simulation methods. NEURON: conventional NEURON simulator running on a single CPU core. CoreNEURON: CoreNEURON simulator on a single GPU. DeepDendrite: DeepDendrite on a single GPU. a b a a c b d a e d In the DeepDendrite platform, both full-spine and few-spine models achieved 8 times speedup compared to CoreNEURON on the GPU platform and 100 times speedup compared to serial NEURON on the CPU platform (Fig. ; Supplementary Table ) while keeping the identical simulation results (Supplementary Figs. and ). Therefore, the DHS method enables explorations of dendritic excitability under more realistic anatomic conditions. 5e 1 4 8 Discussion In this work, we propose the DHS method to parallelize the computation of Hines method and we mathematically demonstrate that the DHS provides an optimal solution without any loss of precision. Next, we implement DHS on the GPU hardware platform and use GPU memory boosting techniques to refine the DHS (Fig. ). When simulating a large number of neurons with complex morphologies, DHS with memory boosting achieves a 15-fold speedup (Supplementary Table ) as compared to the GPU method used in CoreNEURON and up to 1,500-fold speedup compared to serial Hines method in the CPU platform (Fig. ; Supplementary Fig. and Supplementary Table ). Furthermore, we develop the GPU-based DeepDendrite framework by integrating DHS into CoreNEURON. Finally, as a demonstration of the capacity of DeepDendrite, we present a representative application: examine spine computations in a detailed pyramidal neuron model with 25,000 spines. Further in this section, we elaborate on how we have expanded the DeepDendrite framework to enable efficient training of biophysically detailed neural networks. To explore the hypothesis that dendrites improve robustness against adversarial attacks , we train our network on typical image classification tasks. We show that DeepDendrite can support both neuroscience simulations and AI-related detailed neural network tasks with unprecedented speed, therefore significantly promoting detailed neuroscience simulations and potentially for future AI explorations. 55 3 1 4 3 1 56 Decades of efforts have been invested in speeding up the Hines method with parallel methods. Early work mainly focuses on network-level parallelization. In network simulations, each cell independently solves its corresponding linear equations with the Hines method. Network-level parallel methods distribute a network on multiple threads and parallelize the computation of each cell group with each thread , . With network-level methods, we can simulate detailed networks on clusters or supercomputers . In recent years, GPU has been used for detailed network simulation. Because the GPU contains massive computing units, one thread is usually assigned one cell rather than a cell group , , . With further optimization, GPU-based methods achieve much higher efficiency in network simulation. However, the computation inside the cells is still serial in network-level methods, so they still cannot deal with the problem when the “Hines matrix” of each cell scales large. 57 58 59 35 60 61 Cellular-level parallel methods further parallelize the computation inside each cell. The main idea of cellular-level parallel methods is to split each cell into several sub-blocks and parallelize the computation of those sub-blocks , . However, typical cellular-level methods (e.g., the “multi-split” method ) pay less attention to the parallelization strategy. The lack of a fine parallelization strategy results in unsatisfactory performance. To achieve higher efficiency, some studies try to obtain finer-grained parallelization by introducing extra computation operations , , or making approximations on some crucial compartments, while solving linear equations , . These finer-grained parallelization strategies can get higher efficiency but lack sufficient numerical accuracy as in the original Hines method. 27 28 28 29 38 62 63 64 Unlike previous methods, DHS adopts the finest-grained parallelization strategy, i.e., compartment-level parallelization. By modeling the problem of “how to parallelize” as a combinatorial optimization problem, DHS provides an optimal compartment-level parallelization strategy. Moreover, DHS does not introduce any extra operation or value approximation, so it achieves the lowest computational cost and retains sufficient numerical accuracy as in the original Hines method at the same time. Dendritic spines are the most abundant microstructures in the brain for projection neurons in the cortex, hippocampus, cerebellum, and basal ganglia. As spines receive most of the excitatory inputs in the central nervous system, electrical signals generated by spines are the main driving force for large-scale neuronal activities in the forebrain and cerebellum , . The structure of the spine, with an enlarged spine head and a very thin spine neck—leads to surprisingly high input impedance at the spine head, which could be up to 500 MΩ, combining experimental data and the detailed compartment modeling approach , . Due to such high input impedance, a single synaptic input can evoke a “gigantic” EPSP ( ~ 20 mV) at the spine-head level , , thereby boosting NMDA currents and ion channel currents in the spine . However, in the classic single detailed compartment models, all spines are replaced by the coefficient modifying the dendritic cable geometries . This approach may compensate for the leak currents and capacitance currents for spines. Still, it cannot reproduce the high input impedance at the spine head, which may weaken excitatory synaptic inputs, particularly NMDA currents, thereby reducing the nonlinearity in the neuron’s input-output curve. Our modeling results are in line with this interpretation. 10 11 48 65 48 66 11 F 54 په بل ډول، د پړاو د بریښنا compartmentalization تل د بیوکومیکل compartmentalization سره تړل کیږي. , , , resulting in a drastic increase of internal [Ca2+], within the spine and a cascade of molecular processes involving synaptic plasticity of importance for learning and memory. Intriguingly, the biochemical process triggered by learning, in turn, remodels the spine’s morphology, enlarging (or shrinking) the spine head, or elongating (or shortening) the spine neck, which significantly alters the spine’s electrical capacity , , , . Such experience-dependent changes in spine morphology also referred to as “structural plasticity”, have been widely observed in the visual cortex , , somatosensory cortex , , motor cortex , hippocampus , and the basal ganglia in vivo. They play a critical role in motor and spatial learning as well as memory formation. However, due to the computational costs, nearly all detailed network models exploit the “F-factor” approach to replace actual spines, and are thus unable to explore the spine functions at the system level. By taking advantage of our framework and the GPU platform, we can run a few thousand detailed neurons models, each with tens of thousands of spines on a single GPU, while maintaining ~100 times faster than the traditional serial method on a single CPU (Fig. ). Therefore, it enables us to explore of structural plasticity in large-scale circuit models across diverse brain regions. 8 52 67 67 68 69 70 71 72 73 74 75 9 76 5e Another critical issue is how to link dendrites to brain functions at the systems/network level. It has been well established that dendrites can perform comprehensive computations on synaptic inputs due to enriched ion channels and local biophysical membrane properties , , . For example, cortical pyramidal neurons can carry out sublinear synaptic integration at the proximal dendrite but progressively shift to supralinear integration at the distal dendrite . Moreover, distal dendrites can produce regenerative events such as dendritic sodium spikes, calcium spikes, and NMDA spikes/plateau potentials , . Such dendritic events are widely observed in mice or even human cortical neurons in vitro، چې کولی شي په مختلفو منطقي عملونو وړاندې کړي , or gating functions , . Recently, in vivo recordings in awake or behaving mice provide strong evidence that dendritic spikes/plateau potentials are crucial for orientation selectivity in the visual cortex , sensory-motor integration in the whisker system , , and spatial navigation in the hippocampal CA1 region . 5 6 7 77 6 78 6 79 6 79 80 81 82 83 84 85 To establish the causal link between dendrites and animal (including human) patterns of behavior, large-scale biophysically detailed neural circuit models are a powerful computational tool to realize this mission. However, running a large-scale detailed circuit model of 10,000-100,000 neurons generally requires the computing power of supercomputers. It is even more challenging to optimize such models for in vivo data, as it needs iterative simulations of the models. The DeepDendrite framework can directly support many state-of-the-art large-scale circuit models , , , which were initially developed based on NEURON. Moreover, using our framework, a single GPU card such as Tesla A100 could easily support the operation of detailed circuit models of up to 10,000 neurons, thereby providing carbon-efficient and affordable plans for ordinary labs to develop and optimize their own large-scale detailed models. 86 87 88 Recent works on unraveling the dendritic roles in task-specific learning have achieved remarkable results in two directions, i.e., solving challenging tasks such as image classification dataset ImageNet with simplified dendritic networks , and exploring full learning potentials on more realistic neuron , . However, there lies a trade-off between model size and biological detail, as the increase in network scale is often sacrificed for neuron-level complexity , , . Moreover, more detailed neuron models are less mathematically tractable and computationally expensive . 20 21 22 19 20 89 21 There has also been progress in the role of active dendrites in ANNs for computer vision tasks. Iyer et al. . proposed a novel ANN architecture with active dendrites, demonstrating competitive results in multi-task and continual learning. Jones and Kording used a binary tree to approximate dendrite branching and provided valuable insights into the influence of tree structure on single neurons’ computational capacity. Bird et al. . proposed a dendritic normalization rule based on biophysical behavior, offering an interesting perspective on the contribution of dendritic arbor structure to computation. While these studies offer valuable insights, they primarily rely on abstractions derived from spatially extended neurons, and do not fully exploit the detailed biological properties and spatial information of dendrites. Further investigation is needed to unveil the potential of leveraging more realistic neuron models for understanding the shared mechanisms underlying brain computation and deep learning. 90 91 92 د دې چمتو کولو لپاره، موږ DeepDendrite، یو وسایلو چې د Dendritic Hierarchical Scheduling (DHS) طریقې کاروي د محاسباتو لګښتونو په عمده توګه کمولو لپاره، او د I/O ماډل او د زده کړې ماډل سره د لوی ډاټا سیټونو په کارولو لپاره شامل دي. د DeepDendrite سره، موږ په بریالیتوب سره د درې layer hybrid عصري شبکې، د انسان د پیرامیډال سلایټ نیټ (HPC-Net) (د انځور. ). This network demonstrated efficient training capabilities in image classification tasks, achieving approximately 25 times speedup compared to training on a traditional CPU-based platform (Fig. ; Supplementary Table ). 6a, b 6f 1 The illustration of the Human Pyramidal Cell Network (HPC-Net) for image classification. Images are transformed to spike trains and fed into the network model. Learning is triggered by error signals propagated from soma to dendrites. Training with mini-batch. Multiple networks are simulated simultaneously with different images as inputs. The total weight updates ΔW are computed as the average of ΔWi from each network. Comparison of the HPC-Net before and after training. Left, the visualization of hidden neuron responses to a specific input before (top) and after (bottom) training. Right, hidden layer weights (from input to hidden layer) distribution before (top) and after (bottom) training. Workflow of the transfer adversarial attack experiment. We first generate adversarial samples of the test set on a 20-layer ResNet. Then use these adversarial samples (noisy images) to test the classification accuracy of models trained with clean images. Prediction accuracy of each model on adversarial samples after training 30 epochs on MNIST (left) and Fashion-MNIST (right) datasets. Run time of training and testing for the HPC-Net. The batch size is set to 16. Left, run time of training one epoch. Right, run time of testing. Parallel NEURON + Python: training and testing on a single CPU with multiple cores, using 40-process-parallel NEURON to simulate the HPC-Net and extra Python code to support mini-batch training. DeepDendrite: training and testing the HPC-Net on a single GPU with DeepDendrite. a b c d e f Additionally, it is widely recognized that the performance of Artificial Neural Networks (ANNs) can be undermined by adversarial attacks —intentionally engineered perturbations devised to mislead ANNs. Intriguingly, an existing hypothesis suggests that dendrites and synapses may innately defend against such attacks . Our experimental results utilizing HPC-Net lend support to this hypothesis, as we observed that networks endowed with detailed dendritic structures demonstrated some increased resilience to transfer adversarial attacks په پرتله د معياري ANNs، لکه څنګه چې په MNIST ښيي and Fashion-MNIST datasets (Fig. ). دا شواهد پدې معنی کوي چې د ډینډریټونو د بیو فیزیکی ځانګړتیاوې ممکن د ANNs د مخالفې مخنیوی ضد قوي راټولولو لپاره مهم وي. په هرصورت، دا مهمه ده چې نور مطالعې ترسره شي ترڅو د دې پایلو سره د ډاټا سیټونو لکه ImageNet په کارولو سره د دې پایلو تصدیق کړي. . 93 56 94 95 96 6D، او 97 In conclusion, DeepDendrite has shown remarkable potential in image classification tasks, opening up a world of exciting future directions and possibilities. To further advance DeepDendrite and the application of biologically detailed dendritic models in AI tasks, we may focus on developing multi-GPU systems and exploring applications in other domains, such as Natural Language Processing (NLP), where dendritic filtering properties align well with the inherently noisy and ambiguous nature of human language. Challenges include testing scalability in larger-scale problems, understanding performance across various tasks and domains, and addressing the computational complexity introduced by novel biological principles, such as active dendrites. By overcoming these limitations, we can further advance the understanding and capabilities of biophysically detailed dendritic neural networks, potentially uncovering new advantages, enhancing their robustness against adversarial attacks and noisy inputs, and ultimately bridging the gap between neuroscience and modern AI. Methods Simulation with DHS CoreNEURON simulator ( ) uses the NEURON architecture and is optimized for both memory usage and computational speed. We implement our Dendritic Hierarchical Scheduling (DHS) method in the CoreNEURON environment by modifying its source code. All models that can be simulated on GPU with CoreNEURON can also be simulated with DHS by executing the following command: 35 https://github.com/BlueBrain/CoreNeuron 25 coreneuron_exec -d /path/to/models -e time --cell-permute 3 --cell-nthread 16 --gpu The usage options are as in Table . 1 د سیلیکون په کچه paralel calculation په کارولو سره دقت د نمونې د دقت تضمین کولو لپاره، موږ لومړی باید د سلول په کچه parallel algorithm د درستتیا د مشخصولو ته اړتیا لرئ چې دا به په پرتله د ثابت حق serial methods، لکه د Hines method په NEURON نمونې پلیټ فارم کې کارول په پرتله ورته حلونه تولید کړي. د parallel computing نظریاتو پر بنسټ , a parallel algorithm will yield an identical result as its corresponding serial algorithm, if and only if the data process order in the parallel algorithm is consistent with data dependency in the serial method. The Hines method has two symmetrical phases: triangularization and back-substitution. By analyzing the serial computing Hines method , we find that its data dependency can be formulated as a tree structure, where the nodes on the tree represent the compartments of the detailed neuron model. In the triangularization process, the value of each node depends on its children nodes. In contrast, during the back-substitution process, the value of each node is dependent on its parent node (Fig. ). Thus, we can compute nodes on different branches in parallel as their values are not dependent. 34 55 1d Based on the data dependency of the serial computing Hines method, we propose three conditions to make sure a parallel method will yield identical solutions as the serial computing Hines method: (1) The tree morphology and initial values of all nodes are identical to those in the serial computing Hines method; (2) In the triangularization phase, a node can be processed if and only if all its children nodes are already processed; (3) In the back-substitution phase, a node can be processed only if its parent node is already processed. Once a parallel computing method satisfies these three conditions, it will produce identical solutions as the serial computing method. Computational cost of cellular-level parallel computing method To theoretically evaluate the run time, i.e., efficiency, of the serial and parallel computing methods, we introduce and formulate the concept of computational cost as follows: given a tree and threads (basic computational units) to perform triangularization, parallel triangularization equals to divide the node set of into subsets, i.e., = { , , … } where the size of each subset | د نندارتون , i.e., at most nodes can be processed each step since there are only threads. The process of the triangularization phase follows the order: → → … → , and nodes in the same subset can be processed in parallel. So, we define | | (the size of set په دې توګه، here) as the computational cost of the parallel computing method. In short, we define the computational cost of a parallel method as the number of steps it takes in the triangularization phase. Because the back-substitution is symmetrical with triangularization, the total cost of the entire solving equation phase is twice that of the triangularization phase. T k V T n V V1 V2 Vn Vi k k k V1 د V2 Vn Vi V V n Mathematical scheduling problem Based on the simulation accuracy and computational cost, we formulate the parallelization problem as a mathematical scheduling problem: Given a tree = { , } and a positive integer , where is the node-set and is the edge set. Define partition ( ) = { , , … }, | | ≤ , 1 ≤ ≤ n, where | | indicates the cardinal number of subset , i.e., the number of nodes in , and for each node ∈ , all its children nodes { | ∈children( )} must in a previous subset , where 1 ≤ < . Our goal is to find an optimal partition ( ) whose computational cost | ( )| is minimal. T V E k V E P V V1 V2 Vn Vi k i Vi Vi Vi v Vi c c v Vj j i P* V P* V Here subset consists of all nodes that will be computed at د پړاو (Fig. ), so | | ≤ indicates that we can compute nodes each step at most because the number of available threads is . The restriction “for each node ∈ , all its children nodes { | ∈children( )} must in a previous subset , where 1 ≤ < ” indicates that node can be processed only if all its child nodes are processed. Vi i 2 او Vi k k k v Vi c c v Vj j i v DHS implementation We aim to find an optimal way to parallelize the computation of solving linear equations for each neuron model by solving the mathematical scheduling problem above. To get the optimal partition, DHS first analyzes the topology and calculates the depth ( ) for all nodes ∈ . Then, the following two steps will be executed iteratively until every node ∈ is assigned to a subset: (1) find all candidate nodes and put these nodes into candidate set . A node is a candidate only if all its child nodes have been processed or it does not have any child nodes. (2) if | | ≤ , i.e., the number of candidate nodes is smaller or equivalent to the number of available threads, remove all nodes in and put them into , otherwise, remove deepest nodes from and add them to subset . Label these nodes as processed nodes (Fig. ). After filling in subset , go to step (1) to fill in the next subset . d v v V v V Q Q k Q V*i k Q Vi 2d Vi Vi+1 Correctness proof for DHS After applying DHS to a neural tree = { , }, we get a partition ( ) = { , , … }, | | ≤ ، 1 ≤ ≤ . Nodes in the same subset will be computed in parallel, taking steps to perform triangularization and back-substitution, respectively. We then demonstrate that the reordering of the computation in DHS will result in a result identical to the serial Hines method. T V E P V V1 V2 په Vi k i n Vi n The partition ( ) obtained from DHS decides the computation order of all nodes in a neural tree. Below we demonstrate that the computation order determined by ( ) satisfies the correctness conditions. ( ) is obtained from the given neural tree . Operations in DHS do not modify the tree topology and values of tree nodes (corresponding values in the linear equations), so the tree morphology and initial values of all nodes are not changed, which satisfies condition 1: the tree morphology and initial values of all nodes are identical to those in serial Hines method. In triangularization, nodes are processed from subset to . As shown in the implementation of DHS, all nodes in subset د کانټینټ سیټ څخه انتخاب شوي دي , and a node can be put into only if all its child nodes have been processed. Thus the child nodes of all nodes in are in { د , … }, meaning that a node is only computed after all its children have been processed, which satisfies condition 2: in triangularization, a node can be processed if and only if all its child nodes are already processed. In back-substitution, the computation order is the opposite of that in triangularization, i.e., from to . As shown before, the child nodes of all nodes in are in { , , … }, so parent nodes of nodes in are in { , , … }, which satisfies condition 3: in back-substitution, a node can be processed only if its parent node is already processed. P V P V P V T V1 Vn Vi Q Q Vi V1 V2 Vi-1 Vn V1 Vi V1 V2 Vi-1 Vi Vi+1 Vi+2 Vn Optimality proof for DHS The idea of the proof is that if there is another optimal solution, it can be transformed into our DHS solution without increasing the number of steps the algorithm requires, thus indicating that the DHS solution is optimal. For each subset in ( ), DHS moves (د نښلیدو شمیره) د اړونده کانټینټ سیټ څخه تر ټولو عمیق نښلیدو to . If the number of nodes in is smaller than , move all nodes from to د ساده کولو لپاره، موږ د , indicating the depth sum of deepest nodes in . All subsets in ( ) satisfy the max-depth criteria (Supplementary Fig. ): . We then prove that selecting the deepest nodes in each iteration makes an optimal partition. If there exists an optimal partition = { , د ... } containing subsets that do not satisfy the max-depth criteria, we can modify the subsets in ( ) so that all subsets consist of the deepest nodes from and the number of subsets ( | ( )|) remain the same after modification. Vi P V k Qi د VI Qi k Qi Vi Di k Qi P V 6a P(V) P*(V) V*1 V*2 V*s P* V Q P* V Without any loss of generalization, we start from the first subset not satisfying the criteria, i.e., . There are two possible cases that will make not satisfy the max-depth criteria: (1) | | < and there exist some valid nodes in that are not put to ; (2) | | = but nodes in are not the deepest nodes in . V*i V*i V*i k Qi V * I V*i k V*i k Qi For case (1), because some candidate nodes are not put to , these nodes must be in the subsequent subsets. As | | , we can move the corresponding nodes from the subsequent subsets to , which will not increase the number of subsets and make satisfy the criteria (Supplementary Fig. , top). For case (2), | | = , these deeper nodes that are not moved from the candidate set into must be added to subsequent subsets (Supplementary Fig. , bottom). These deeper nodes can be moved from subsequent subsets to through the following method. Assume that after filling , is picked and one of the -th deepest nodes is still in , thus will be put into a subsequent subset ( > ). We first move د to + , then modify subset + as follows: if | + | ≤ and none of the nodes in + is the parent of node , stop modifying the latter subsets. Otherwise, modify + as follows (Supplementary Fig. ): if the parent node of is in + , move this parent node to + ; else move the node with minimum depth from + د to + د . After adjusting , modify subsequent subsets + , + , … with the same strategy. Finally, move from to . V*i V*i < k V*i V*i 6b V*i k Qi V*i د 6B V*i V*i v k v’ Qi v’ V*j j i v V*i V*i 1 V*i 1 V*i 1 k V*i 1 v V*i 1 6c v V*i 1 V*i 2 V*i 1 V * I 2 V*i V*i 1 V*i 2 V*j-1 v’ V*j V*i د پرمختللي بدلون ستراتیژۍ سره، موږ کولی شو په دې کې د ټولو کمېسيونونه بدلون وکړي. سره د -th deepest node in and keep the number of subsets, i.e., | ( ) د بدلون وروسته ورته وي. موږ کولی شو په ټولې زیرکټونو کې د ورته ستراتیژۍ سره د نډونو بدلون وکړي. ( ) that do not contain the deepest nodes. Finally, all subsets ∈ ( ) کولی شي د max عمده معیارونو پوره کړي، او ( )| does not change after modifying. V * I k د Qi د P * V P* V V*i P* V P* V In conclusion, DHS generates a partition ( ), and all subsets ∈ ( ) د اعظمي عمده شرایط پوره کړي: . د هر بل غوره برخې لپاره ( ) موږ کولی شو د دې subsets بدلون ته د دې جوړښت ورته کړي ( ) ، د دې په توګه، د هر کمېسيټ د کانډاډیټ سیټ کې ترټولو عمیق نندارتونونو څخه جوړ شوی دی، او په دې توګه د ( ) په ورته ډول وروسته بدلون. نو، د برخې ( ) له DHS څخه ترلاسه شوې ده د غوره پارتیشنونو څخه یو. P V د VI P V د P * V P V د P * V | P V د GPU تثبيت او د حافظه تثبيت د ډیزاین لپاره د ډیزاین لپاره، د GPU د (1) د نړۍ د یادښت، (2) کیچ، (3) ریکارډ د یادښت جغرافیه کاروي، په داسې حال کې چې د نړۍ د یادښت لوی ظرفیت لري مګر ټيټ رسولو لري، په داسې حال کې چې ریکارډونه کم ظرفیت لري مګر لوړ رسولو لري. موږ د GPU د یادښت جغرافیه ګټه کولو له لارې د یادښت رسولو زیات کړي. د GPU د SIMT (Single-Instruction، Multiple-Thread) آرشیفیت کاروي. Warps د GPU کې د بنسټیز تادیاتو واحدونه دي (د warp د 32 متوازن ټریډونو ډلې دی). د warp د مختلف ټریډونو لپاره د مختلفو ډاټا سره ورته دستور ترسره کوي . Correctly ordering the nodes is essential for this batching of computation in warps, to make sure DHS obtains identical results as the serial Hines method. When implementing DHS on GPU, we first group all cells into multiple warps based on their morphologies. Cells with similar morphologies are grouped in the same warp. We then apply DHS on all neurons, assigning the compartments of each neuron to multiple threads. Because neurons are grouped into warps, the threads for the same neuron are in the same warp. Therefore, the intrinsic synchronization in warps keeps the computation order consistent with the data dependency of the serial Hines method. Finally, threads in each warp are aligned and rearranged according to the number of compartments. 46 When a warp loads pre-aligned and successively-stored data from global memory, it can make full use of the cache, which leads to high memory throughput, while accessing scatter-stored data would reduce memory throughput. After compartments assignment and threads rearrangement, we permute data in global memory to make it consistent with computing orders so that warps can load successively-stored data when executing the program. Moreover, we put those necessary temporary variables into registers rather than global memory. Registers have the highest memory throughput, so the use of registers further accelerates DHS. د بشپړ ودانۍ او لږ ودانۍ biophysical ماډلونه We used the published human pyramidal neuron . The membrane capacitance m = 0.44 μF cm-2, membrane resistance m = 48,300 Ω cm2, and axial resistivity a = 261.97 Ω cm. In this model, all dendrites were modeled as passive cables while somas were active. The leak reversal potential l = -83.1 mV. Ion channels such as Na+ and K+ were inserted on soma and initial axon, and their reversal potentials were Na = 67.6 mV, K = -102 mV respectively. All these specific parameters were set the same as in the model of Eyal, et al. , for more details please refer to the published model (ModelDB, access No. 238347). 51 c r r E E E 51 In the few-spine model, the membrane capacitance and maximum leak conductance of the dendritic cables 60 μm away from soma were multiplied by a spine factor to approximate dendritic spines. In this model, spine was set to 1.9. Only the spines that receive synaptic inputs were explicitly attached to dendrites. F F In the full-spine model, all spines were explicitly attached to dendrites. We calculated the spine density with the reconstructed neuron in Eyal, et al. . The spine density was set to 1.3 μm-1, and each cell contained 24994 spines on dendrites 60 μm away from the soma. 51 The morphologies and biophysical mechanisms of spines were the same in few-spine and full-spine models. The length of the spine neck neck = 1.35 μm and the diameter neck = 0.25 μm, whereas the length and diameter of the spine head were 0.944 μm, i.e., the spine head area was set to 2.8 μm2. Both spine neck and spine head were modeled as passive cables, with the reversal potential = -86 mV. The specific membrane capacitance, membrane resistance, and axial resistivity were the same as those for dendrites. L D El Synaptic inputs We investigated neuronal excitability for both distributed and clustered synaptic inputs. All activated synapses were attached to the terminal of the spine head. For distributed inputs, all activated synapses were randomly distributed on all dendrites. For clustered inputs, each cluster consisted of 20 activated synapses that were uniformly distributed on a single randomly-selected compartment. All synapses were activated simultaneously during the simulation. AMPA-based and NMDA-based synaptic currents were simulated as in Eyal et al.’s work. AMPA conductance was modeled as a double-exponential function and NMDA conduction as a voltage-dependent double-exponential function. For the AMPA model, the specific rise and decay were set to 0.3 and 1.8 ms. For the NMDA model, rise and decay were set to 8.019 and 34.9884 ms, respectively. The maximum conductance of AMPA and NMDA were 0.73 nS and 1.31 nS. τ τ τ τ Background noise We attached background noise to each cell to simulate a more realistic environment. Noise patterns were implemented as Poisson spike trains with a constant rate of 1.0 Hz. Each pattern started at start = 10 ms and lasted until the end of the simulation. We generated 400 noise spike trains for each cell and attached them to randomly-selected synapses. The model and specific parameters of synaptic currents were the same as described in , except that the maximum conductance of NMDA was uniformly distributed from 1.57 to 3.275, resulting in a higher AMPA to NMDA ratio. t Synaptic Inputs Exploring neuronal excitability موږ د سپیک احتمالی څیړنه وکړه کله چې ډیری سینیپسونه په ورته وخت کې فعال شوي دي. د توزیع شوي انډولونو لپاره، موږ د 0 څخه تر 240 فعال شوي سینیپسونو لپاره د 14 صورتونو ازموینه کړ. د کلستر شوي انډولونو لپاره، موږ د 9 صورتونو په مجموعي توګه ازموینه کړ، د 0 څخه تر 12 کلسترونو پورې فعال شوی. هر کلستر د 20 سینیپسونو څخه جوړ شوی. د توزیع شوي او کلستر شوي انډولونو په هر صورت کې، موږ د سپیک احتمالي اندازې سره د 50 تصادفي نمونې سره محاسبه کړ. د سپیک احتمال د نوريونونو شمېر د کلسترونو شمیره شمیره نسبت په توګه تعریف شوی. ټول 1150 نمونې زموږ DeepDendrite پلیٹ فارم کې په ورته وخت کې سمول شوي Performing AI tasks with the DeepDendrite platform Conventional detailed neuron simulators lack two functionalities important to modern AI tasks: (1) alternately performing simulations and weight updates without heavy reinitialization and (2) simultaneously processing multiple stimuli samples in a batch-like manner. Here we present the DeepDendrite platform, which supports both biophysical simulating and performing deep learning tasks with detailed dendritic models. DeepDendrite consists of three modules (Supplementary Fig. ): (1) an I/O module; (2) a DHS-based simulating module; (3) a learning module. When training a biophysically detailed model to perform learning tasks, users first define the learning rule, then feed all training samples to the detailed model for learning. In each step during training, the I/O module picks a specific stimulus and its corresponding teacher signal (if necessary) from all training samples and attaches the stimulus to the network model. Then, the DHS-based simulating module initializes the model and starts the simulation. After simulation, the learning module updates all synaptic weights according to the difference between model responses and teacher signals. After training, the learned model can achieve performance comparable to ANN. The testing phase is similar to training, except that all synaptic weights are fixed. 5 HPC-Net model انځور کټګوریشن د AI په سيمه کې یو معمولي کار دی. په دې کار کې، يو ماډل باید د یو ځانګړي انځور کې د موادو په اړه پوه شي او د اړونده لیبل صادر کړي. دلته موږ د HPC-Net، یو شبکې چې د انسان د پیرامیډل نیورون ماډلونو څخه جوړ شوی دی چې د DeepDendrite پلیټ فارم په کارولو سره د انځور کټګوریشن ورکشاپونه زده کړي. HPC-Net has three layers, i.e., an input layer, a hidden layer, and an output layer. The neurons in the input layer receive spike trains converted from images as their input. Hidden layer neurons receive the output of input layer neurons and deliver responses to neurons in the output layer. The responses of the output layer neurons are taken as the final output of HPC-Net. Neurons between adjacent layers are fully connected. For each image stimulus, we first convert each normalized pixel to a homogeneous spike train. For pixel with coordinates ( ) in the image, the corresponding spike train has a constant interspike interval ISI( ) (in ms) which is determined by the pixel value ( ) as shown in Eq. ( ). x, y τ x, y p x, y 1 In our experiment, the simulation for each stimulus lasted 50 ms. All spike trains started at 9 + ISI ms and lasted until the end of the simulation. Then we attached all spike trains to the input layer neurons in a one-to-one manner. The synaptic current triggered by the spike arriving at time is given by τ t0 where is the post-synaptic voltage, the reversal potential syn = 1 mV, the maximum synaptic conductance max = 0.05 μS، او د وخت ثابت = 0.5 ms. v E g τ د انټرنېټ کټګوریو کې د نوريونونو سره د ګمرک واحد کټګوریو ماډل سره ماډل شوي. د ځانګړي پارامترونه د دې په توګه تنظیم شوي دي: د مینی capacitance m = 1.0 μF cm-2، د membrane مقاومت m = 104 Ω cm2, axial resistivity a = 100 Ω سانتیمتر، د ګمرک کچه د بدلون پیاوړی l = 0 mV. c r r E The hidden layer contains a group of human pyramidal neuron models, receiving the somatic voltages of input layer neurons. The morphology was from Eyal, et al. , and all neurons were modeled with passive cables. The specific membrane capacitance m = 1.5 μF cm-2, membrane resistance m = 48,300 Ω cm2, axial resistivity a = 261.97 Ω cm, and the reversal potential of all passive cables l = 0 mV. Input neurons could make multiple connections to randomly-selected locations on the dendrites of hidden neurons. The synaptic current activated by the -th synapse of the -th input neuron on neuron ’s dendrite is defined as in Eq. ( ), where is the synaptic conductance, is the synaptic weight, is the ReLU-like somatic activation function, and is the somatic voltage of the -th input neuron at time . 51 c r r E k i j 4 gijk Wijk i t Neurons in the output layer were also modeled with a passive single-compartment model, and each hidden neuron only made one synaptic connection to each output neuron. All specific parameters were set the same as those of the input neurons. Synaptic currents activated by hidden neurons are also in the form of Eq. ( ). 4 Image classification with HPC-Net For each input image stimulus, we first normalized all pixel values to 0.0-1.0. Then we converted normalized pixels to spike trains and attached them to input neurons. Somatic voltages of the output neurons are used to compute the predicted probability of each class, as shown in equation که څه د احتمال د -th class predicted by the HPC-Net, is the average somatic voltage from 20 ms to 50 ms of the -th output neuron, and indicates the number of classes, which equals the number of output neurons. The class with the maximum predicted probability is the final classification result. In this paper, we built the HPC-Net with 784 input neurons, 64 hidden neurons, and 10 output neurons. 6 pi i i C Synaptic plasticity rules for HPC-Net Inspired by previous work , we use a gradient-based learning rule to train our HPC-Net to perform the image classification task. The loss function we use here is cross-entropy, given in Eq. ( ), where is the predicted probability for class , ښيي چې په واقعي ټولګي کې چې د انجن انځور ته اړتيا لري، = 1 if input image belongs to class , and = 0 if not. 36 7 د Pi i yi yi i yi When training HPC-Net, we compute the update for weight (the synaptic weight of the -th synapse connecting neuron to neuron ) at each time step. After the simulation of each image stimulus, is updated as shown in Eq. ( ): Wijk k i j کوره 8 Here is the learning rate, is the update value at time , , are somatic voltages of neuron and respectively, is the -th synaptic current activated by neuron on neuron , its synaptic conductance, is the transfer resistance between the -th د نوريون د تړلو برخې on neuron ’s dendrite to neuron ’s soma, s = 30 ms, e = 50 ms are start time and end time for learning respectively. For output neurons, the error term can be computed as shown in Eq. ( ). For hidden neurons, the error term is calculated from the error terms in the output layer, given in Eq. ( ). t vj vi i j Iijk k i j gijk د دولت k i j j t t 10 11 Since all output neurons are single-compartment, equals to the input resistance of the corresponding compartment, . Transfer and input resistances are computed by NEURON. د مینی بیچ روزنې د عمیق زده کړې لپاره د لوړ پیشکش دقیقیت او د konvergence په چټکۍ سره ترلاسه کولو لپاره یو معمولي روش دی. DeepDendrite د مینی بیچ روزنې هم ملاتړ کوي. کله چې د مینی بیچ اندازه سره HPC-Net روزنه کوي batch, we make batch copies of HPC-Net. During training, each copy is fed with a different training sample from the batch. DeepDendrite first computes the weight update for each copy separately. After all copies in the current training batch are done, the average weight update is calculated and weights in all copies are updated by this same amount. N N Robustness against adversarial attack with HPC-Net To demonstrate the robustness of HPC-Net, we tested its prediction accuracy on adversarial samples and compared it with an analogous ANN (one with the same 784-64-10 structure and ReLU activation, for fair comparison in our HPC-Net each input neuron only made one synaptic connection to each hidden neuron). We first trained HPC-Net and ANN with the original training set (original clean images). Then we added adversarial noise to the test set and measured their prediction accuracy on the noisy test set. We used the Foolbox , to generate adversarial noise with the FGSM method . ANN was trained with PyTorch , and HPC-Net was trained with our DeepDendrite. For fairness, we generated adversarial noise on a significantly different network model, a 20-layer ResNet . The noise level ranged from 0.02 to 0.2. We experimented on two typical datasets, MNIST and Fashion-MNIST . Results show that the prediction accuracy of HPC-Net is 19% and 16.72% higher than that of the analogous ANN, respectively. 98 99 93 100 101 95 96 Reporting summary د څیړنې ډیزاین په اړه نور معلومات په د دې مقاله سره تړاو لري. د Nature Portfolio راپور خلاص Data availability د معلوماتو چې د دې څېړنې پایلو ملاتړ دي په کاغذ، د اضافي معلوماتو او سرچینې معلوماتو فایلونو کې شتون لري چې د دې کاغذ سره وړاندې شوي دي. د سرچینې کوډ او معلوماتو چې په انځورونو کې د پایلو reproducing کارول. – همدارنګه په . د MNIST ډاټا سیټ په عمومي توګه د . The Fashion-MNIST dataset is publicly available at د are provided with this paper. 3 6 https://github.com/pkuzyc/DeepDendrite http://yann.lecun.com/exdb/mnist https://github.com/zalandoresearch/fashion-mnist Source data Code availability The source code of DeepDendrite as well as the models and code used to reproduce Figs. – in this study are available at . 3 6 https://github.com/pkuzyc/DeepDendrite References McCulloch، W. S. & Pitts، W. د عصبي فعالیت کې مخکښ افکارونو منطقي محاسبه. Bull. Math. Biophys. 5, 115-133 (1943). LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. , 436–444 (2015). Nature 521 Poirazi، P.، Brannon، T. & Mel، B. W. د يو نمونوي CA1 پیرامیډال سیل کې د زیربنا synaptic summation ارامیک. Neuron 37, 977–987 (2003). London, M. & Häusser, M. Dendritic computation. , 503–532 (2005). Annu. Rev. Neurosci. 28 برانکو، T. & Häusser، M. په عصري سیستم کې د بنسټیز فعالیت واحد په توګه د واحد dendritic زون. Curr. Opin. Neurobiol. 20, 494–502 (2010). Stuart، G. J. & Spruston، N. Dendritic انډول: 60 کاله پرمختګ. Nat. Neurosci. 18، 1713-1721 (2015). Poirazi, P. & Papoutsi, A. Illuminating dendritic function with computational models. , 303–321 (2020). Nat. Rev. Neurosci. 21 Yuste، R. & Denk، W. Dendritic سپینونو په توګه د عصري انټرنټ بنسټیز فعالیت واحدونه. Nature 375, 682–684 (1995). Engert، F. & Bonhoeffer، T. د hippocampal د اوږدې مودې synaptic plasticity سره تړاو لري. طبیعت 399, 66-70 (1999). Yuste, R. Dendritic spines and distributed circuits. , 772–781 (2011). Neuron 71 Yuste, R. Electrical compartmentalization in dendritic spines. , 429–449 (2013). Annu. Rev. Neurosci. 36 Rall، W. د dendritic ټوټه او motoneuron membrane مقاومت. Exp. Neurol. 1, 491-527 (1959). Segev, I. & Rall, W. Computational study of an excitable dendritic spine. , 499–523 (1988). J. Neurophysiol. 60 Silver, D. et al. Mastering the game of go with deep neural networks and tree search. , 484–489 (2016). Nature 529 سیلور، D. et al. د عمومي قوی کولو زده کړې algorithm چې د شطرنج، shogi، او د ځان د لوبو له لارې ترسره کوي. Science 362, 1140-1144 (2018). McCloskey, M. & Cohen, N. J. Catastrophic interference in connectionist networks: the sequential learning problem. , 109–165 (1989). Psychol. Learn. Motiv. 24 د فرانسې، R. M. په اړیکو شبکې کې د تشناب د تشناب. ټینډز Cogn. Sci. 3، 128-135 (1999). Naud, R. & Sprekeler, H. Sparse bursts optimize information transmission in a multiplexed neural code. , E6329–E6338 (2018). Proc. Natl Acad. Sci. USA 115 Sacramento, J., Costa, R. P., Bengio, Y. & Senn, W. Dendritic cortical microcircuits approximate the backpropagation algorithm. in (NeurIPS*,* 2018). Advances in Neural Information Processing Systems 31 (NeurIPS 2018) Payeur, A., Guerguiev, J., Zenke, F., Richards, B. A. & Naud, R. Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits. , 1010–1019 (2021). Nat. Neurosci. 24 Bicknell, B. A. & Häusser, M. A synaptic learning rule for exploiting nonlinear dendritic computation. , 4001–4017 (2021). Neuron 109 Moldwin, T., Kalmenson, M. & Segev, I. The gradient clusteron: a model neuron that learns to solve classification tasks via dendritic nonlinearities, structural plasticity, and gradient descent. , e1009015 (2021). PLoS Comput. Biol. 17 Hodgkin, A. L. & Huxley, A. F. A quantitative description of membrane current and Its application to conduction and excitation in nerve. , 500–544 (1952). J. Physiol. 117 Rall، W. د dendrites فزیکي ځانګړتیاوو نظریه. Ann. N. Y. Acad. Sci. 96, 1071-1092 (1962). Hines, M. L. & Carnevale, N. T. The NEURON simulation environment. , 1179–1209 (1997). Neural Comput. 9 Bower، J. M. & Beeman، D. په The Book of GENESIS: Exploring Realistic Neural Models with the General Neural Simulation System (eds Bower، J. M. & Beeman، D.) 17-27 (Springer نیویارک، 1998). Hines، M. L.، Eichner، H. & Schürmann، F. Neuron په کمپیوټریو سره تړل شوي متوازن شبکې نمونې کې د پروسیسرونو په دوه ځله کې د چلند وخت کچول وړاندیز کوي. J. Comput. Neurosci. 25, 203–210 (2008). Hines, M. L., Markram, H. & Schürmann, F. Fully implicit parallel simulation of single neurons. , 439–448 (2008). J. Comput. Neurosci. 25 Ben-Shalom، R.، Liberman، G. & Korngreen، A. په ګرافیک پروسس واحد کې د پارامترونو ماډل کولو چټک کولو. Front. Neuroinform. 7, 4 (2013). Tsuyuki, T., Yamamoto, Y. & Yamazaki, T. Efficient numerical simulation of neuron models with spatial structure on graphics processing units. In (eds Hirose894Akiraet al.) 279–285 (Springer International Publishing, 2016). Proc. 2016 International Conference on Neural Information Processing Vooturi، D. T.، Kothapalli، K. & Bhalla، متحده ایاالتو. په GPU کې د نوريون Simulations کې Parallelizing Hines Matrix Solver. In Proc. IEEE 24th International Conference on High Performance Computing (HiPC) 388-397 (IEEE، 2017). Huber، F. په GPU کې د hines مټريز لپاره اغیزمن tree solver. Preprint په https://arxiv.org/abs/1810.12742 (2018). Korte, B. & Vygen, J. 6 edn (Springer, 2018). Combinatorial Optimization Theory and Algorithms Gebali, F. (Wiley, 2011). Algorithms and Parallel Computing Kumbhar، P. et al. CoreNEURON: د NEURON Simulator لپاره د کمپیوټر انجن ښه شوی. Front. Neuroinform. 13, 63 (2019). Urbanczik, R. & Senn, W. Learning by the dendritic prediction of somatic spiking. , 521–528 (2014). Neuron 81 Ben-Shalom, R., Aviv, A., Razon, B. & Korngreen, A. Optimizing ion channel models using a parallel genetic algorithm on graphical processors. , 183–194 (2012). J. Neurosci. Methods 206 Mascagni، M. د برېښنايي حلونو لپاره د برېښنايي نوريون ماډلونو سره د برېښنايي نوريون ماډلونو لپاره د متوازن کولو algorithm. J. Neurosci. Methods 36, 105-114 (1991). McDougal, R. A. et al. Twenty years of modelDB and beyond: building essential modeling tools for the future of neuroscience. , 1–10 (2017). J. Comput. Neurosci. 42 Migliore، M.، Messineo، L. & Ferrante، M. Dendritic Ih په انتخابي توګه د CA1 پیراميډال نوريونونو کې د غیر synchronized دیستال انټرنټونو په وخت کې ساتل کیږي. J. Comput. Neurosci. 16, 5–13 (2004). Hemond، P. et al. د پیراميډ سلولونو مختلف ټولګيونه په hippocampal سيمه کې CA3b. Hippocampus 18، 411-424 (2008) کې د دوامداره مخنیوي نمونې ښيي. Hay، E.، Hill، S.، Schürmann، F.، Markram، H. & Segev، I. د نوکورټیک کچه 5b پیرامیډال سلولونو نمونې چې د dendritic او perisomatic فعال مالونو په پراخه کچه پوښښ. PLoS کمپیوټ. Biol. 7، e1002107 (2011). Masoli، S.، Solinas، S. & D'Angelo، E. په تفصيلي purkinje سیل ماډل کې د عمل پیاوړتیا پروسس د axonal compartmentalization لپاره یو مهم رول ښودل. Front. Cell. Neurosci. 9, 47 (2015). Lindroos, R. et al. Basal ganglia neuromodulation over multiple temporal and structural scales—simulations of direct pathway MSNs investigate the fast onset of dopaminergic effects and predict the role of Kv4.2. , 3 (2018). Front. Neural Circuits 12 Migliore, M. et al. Synaptic clusters function as odor operators in the olfactory bulb. , 8499–8504 (2015). Proc. Natl Acad. Sci. USa 112 NVIDIA. . (2021). CUDA C++ Programming Guide https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html NVIDIA. . (2021). CUDA C++ Best Practices Guide https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html Harnett, M. T., Makara, J. K., Spruston, N., Kath, W. L. & Magee, J. C. Synaptic amplification by dendritic spines enhances input cooperativity. , 599–602 (2012). Nature 491 Chiu, C. Q. et al. Compartmentalization of GABAergic inhibition by dendritic spines. , 759–762 (2013). Science 340 Tønnesen، J.، Katona، G.، Rózsa، B. & Nägerl، U. V. د سپین کڅوړه پلاستيکيیت د سنابیسونو کڅوړه تنظیموي. Nat. Neurosci. 17، 678-685 (2014). Eyal, G. et al. Human cortical pyramidal neurons: from spines to spikes via models. , 181 (2018). Front. Cell. Neurosci. 12 کوک، C. & Zador، A. د ډینډریټیک سپینونو دنده: د بریښنا په پرتله د بیوکومیسیک پارامترول کولو لپاره وسایلو. J. Neurosci. 13, 413-422 (1993). Koch, C. Dendritic spines. In (Oxford University Press, 1999). Biophysics of Computation Rapp، M.، Yarom، Y. & Segev، I. د دوامداره فایبر پیژندنه فعالیت د cerebellar purkinje سیلونو کیبل ځانګړتیاوو ته اغیز. عصري کمپیوټ 4, 518-533 (1992). Hines, M. Efficient computation of branched nerve equations. , 69–76 (1984). Int. J. Bio-Med. Comput. 15 Nayebi, A. & Ganguli, S. Biologically inspired protection of deep networks from adversarial attacks. Preprint at (2017). https://arxiv.org/abs/1703.09202 Goddard, N. H. & Hood, G. Large-Scale Simulation Using Parallel GENESIS. In (eds Bower James M. & Beeman David) 349-379 (Springer New York, 1998). The Book of GENESIS: Exploring Realistic Neural Models with the GEneral NEural SImulation System Migliore, M., Cannia, C., Lytton, W. W., Markram, H. & Hines, M. L. Parallel network simulations with NEURON. , 119 (2006). J. Comput. Neurosci. 21 Lytton، W. W. et al. د دماغ د څیړنې پرمختګ لپاره د نمونې نیورو ټیکنالوژۍ: په NEURON کې د لوی شبکې paralelizing. Neural Comput. 28، 2063-2090 (2016). Valero-Lara، P. et al. cuHinesBatch: په GPU انسان دماغ پروژې کې څو Hines سیسټمونه حل کړي. په پروګرام کې 2017 د کمپیوټر علومو په اړه نړیوال کنفرانس 566-575 (IEEE، 2017). Akar, N. A. et al. Arbor—A morphologically-detailed neural network simulation library for contemporary high-performance computing architectures. In 274–282 (IEEE, 2019). Proc. 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) Ben-Shalom, R. et al. NeuroGPU: Accelerating multi-compartment, biophysically detailed neuron simulations on GPUs. , 109400 (2022). J. Neurosci. Methods 366 Rempe، M. J. & Chopp، D. L. د مخکښ-کریکټر algorithm لپاره د نوري فعالیت په لړ جوړښتونو سره تړاو د واکنش-د پراختیا مساوي. SIAM J. Sci. Comput. 28، 2139-2161 (2006). Kozloski، J. او واګنر، J. د لوی کچه عصبي ټیټ نمونې لپاره یو غیر معمولي حل. Front. Neuroinform. 5, 15 (2011). Jayant, K. et al. Targeted intracellular voltage recordings from dendritic spines using quantum-dot-coated nanopipettes. , 335–342 (2017). Nat. Nanotechnol. 12 Palmer، L. M. & Stuart، G. J. د عمل د امکاناتو او synaptic input په وخت کې د dendritic spines د membrane احتمالي بدلونونه. J. Neurosci. 29, 6897-6903 (2009). Nishiyama, J. & Yasuda, R. Biochemical computation for spine structural plasticity. , 63–75 (2015). Neuron 87 Yuste، R. & Bonhoeffer، T. د ډینډریټیک سپینونو کې د اوږدې مودې synaptic plasticity سره تړاو لري. Annu. Rev. Neurosci. 24, 1071-1089 (2001). Holtmaat، A. & Svoboda، K. تجربه له خوا د دماغ کې د جوړښت د synaptic plasticity. Nat. Rev. Neurosci. 10, 647–658 (2009). Caroni، P.، Donato، F. & Muller، D. د زده کړې په اړه جوړښت پلاستيکي: تنظیم او دنده. Nat. Rev. Neurosci. 13، 478-490 (2012). Keck, T. et al. Massive restructuring of neuronal circuits during functional reorganization of adult visual cortex. , 1162 (2008). Nat. Neurosci. 11 Hofer, S. B., Mrsic-Flogel, T. D., Bonhoeffer, T. & Hübener, M. Experience leaves a lasting structural trace in cortical circuits. , 313–317 (2009). Nature 457 Trachtenberg، J. T. et al. په عمر کې د تجربه پورې اړه لري synaptic plasticity په بالغ cortex د اوږدې مودې in vivo انځورونه. Nature 420, 788-794 (2002). Marik, S. A., Yamahachi, H., McManus, J. N., Szabo, G. & Gilbert, C. D. Axonal dynamics of excitatory and inhibitory neurons in somatosensory cortex. , e1000395 (2010). PLoS Biol. 8 Xu, T. et al. Rapid formation and selective stabilization of synapses for enduring motor memories. , 915–919 (2009). Nature 462 Albarran، E.، Raissi، A.، Jáidar، O.، Shatz، C. J. & Ding، J. B. د موټور زده کړې سره په موټور کوریکس کې د تازه جوړ شوي dendritic سپینونو ثبات زیاتولو. Neuron 109, 3298-3311 (2021). Branco, T. & Häusser, M. Synaptic integration gradients in single cortical pyramidal cell dendrites. , 885–892 (2011). Neuron 69 Major, G., Larkum, M. E. & Schiller, J. Active properties of neocortical pyramidal neuron dendrites. , 1–24 (2013). Annu. Rev. Neurosci. 36 Gidon, A. et al. Dendritic action potentials and computation in human layer 2/3 cortical neurons. , 83–87 (2020). Science 367 Doron، M.، Chindemi، G.، Muller، E.، Markram، H. & Segev، I. Timed synaptic inhibition forms NMDA spikes، د محلي dendritic پروسس او د cortical neurons Global I / O ځانګړتیاوو اغیزې. سلول Rep. 21، 1550-1561 (2017). Du، K. et al. د سلول ډول په ځانګړي ډول کې د striatal spiny پروژې نیورونونو کې د dendritic پلټی پیاوړی پیاوړتیا inhibition. Proc. Natl Acad. Sci. USA 114, E7612-E7621 (2017). Smith، S. L.، Smith، I. T.، Branco، T. & Häusser، M. Dendritic سپیکونه د cortical نوريونونو کې د اغیزمن انتخابیت لوړوي in vivo. Nature 503, 115-120 (2013). Xu، N.-l et al. Nonlinear dendritic د حساس او موټور انټرنټ په اوږدو کې د فعال احساس ورکشاپ. Nature 492, 247–251 (2012). Takahashi, N., Oertner, T. G., Hegemann, P. & Larkum, M. E. Active cortical dendrites modulate perception. , 1587–1590 (2016). Science 354 Sheffield, M. E. & Dombeck, D. A. Calcium transient prevalence across the dendritic arbour predicts place field properties. , 200–204 (2015). Nature 517 Markram، H. et al. د نیوکورټیک مايکروسیرکټریا Reconstruction and Simulation. Cell 163, 456-492 (2015). Billeh, Y. N. et al. Systematic integration of structural and functional data into multi-scale models of mouse primary visual cortex. , 388–403 (2020). Neuron 106 Hjorth، J. et al. په سیلیکون کې د striatum مايکروسریچونه. Proc. Natl Acad. Sci. USA 117, 202000671 (2020). Guerguiev، J.، Lillicrap، T. P. & Richards، B. A. په اوږدو کې د پراخه زده کړې سره پراختیا dendrites. elife 6, e22901 (2017). Iyer, A. et al. Avoiding catastrophe: active dendrites enable multi-task learning in dynamic environments. , 846219 (2022). Front. Neurorobot. 16 Jones, I. S. & Kording, K. P. Might a single neuron solve interesting machine learning problems through successive computations on its dendritic tree? , 1554–1571 (2021). Neural Comput. 33 Bird, A. D., Jedlicka, P. & Cuntz, H. Dendritic normalisation improves learning in sparsely connected artificial neural networks. , e1009202 (2021). PLoS Comput. Biol. 17 Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. In (ICLR, 2015). 3rd International Conference on Learning Representations (ICLR) Papernot, N., McDaniel, P. & Goodfellow, I. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. Preprint at (2016). https://arxiv.org/abs/1605.07277 Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. , 2278–2324 (1998). Proc. IEEE 86 Xiao، H.، Rasul، K. & Vollgraf، R. فېشن-MNIST: د ماشین زده کړې algorithms benchmarking لپاره یو نوي انځور ډاټاټاټ. په http://arxiv.org/abs/1708.07747 (2017) کې Preprint. Bartunov, S. et al. Assessing the scalability of biologically-motivated deep learning algorithms and architectures. In (NeurIPS, 2018). Advances in Neural Information Processing Systems 31 (NeurIPS 2018) Rauber, J., Brendel, W. & Bethge, M. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. In (2017). Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning Rauber, J., Zimmermann, R., Bethge, M. & Brendel, W. Foolbox native: fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. , 2607 (2020). J. Open Source Softw. 5 Paszke, A. et al. PyTorch: An imperative style, high-performance deep learning library. In (NeurIPS, 2019). Advances in Neural Information Processing Systems 32 (NeurIPS 2019) He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 770–778 (IEEE, 2016). Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) تصدیق دا کار د چين د ملي اصلي R & D پروګرام (نور 2020AAA0130400) ته K.D. او T.H.، د چين د ملي طبیعي علوم فورمې (نور 61825101) ته Y.T.T.، د چين د ملي اصلي R & D پروګرام (نور 2022ZD01163005) ته L.M.، د ګوانګډونګ ولايت د ملي اصلي سيمه R & D پروګرام (نور 2018B030338001) ته T.H.، د چين د ملي طبیعي علوم فورمې (نور 61825101) ته Y.T.، د سویډن د څیړنې شورا (VR-M-2020-01652)، د سویډن د برېښنايي علومو څیړنې مرکز (SeRC)، د اروپا / هورمونز 2020 9455 دا کاغذ په طبیعت کې د CC by 4.0 Deed (Attribution 4.0 International) لائسنس لاندې شتون لري. دا کاغذ په طبیعت کې د CC by 4.0 Deed (Attribution 4.0 International) لائسنس لاندې شتون لري.