I-Dragon Hatchling Ufundisa Ukuhamba: Ngezinye Ukufundwa Kwesifundo lwe-AI

Umhlahlandlela Umhlahlandlela we-Brain-like Dragon Hatchling (BDH) I-network ye-neural neyure angakwazi ukujabulela izikhangibavakashi, ukubhalisa imibuzo, futhi ngisho ukunikezela izivakashi ze-programming - kodwa zonke zihlanganisa efanayo: zihlanganisa . stop learning once deployed Ngezinye izinsuku ezidlulile, injiniyela kanye nezifundo – uAdrian Kosowski, Przemysław Uznanski, Jan Chorowski, Zuzanna Stamirowska, noMichał Bartoszkiewicz – zithunyelwe umbhalo enhle enikeza umqondo entsha emkhakheni yokufundisa umshini kanye ne-neurological architectures. Ngokuvamile, zithunyelwe . new type of artificial neural network https://arxiv.org/abs/2509.26507?embedable=true I-paper ngokuvamile enhle – ephelele nemathemikhali, amaformula, ne-graphs – kodwa ephelele nemibuzo emangalisayo. Ndingathanda ukulungiselela ngempumelelo: ukwenza , nge-metaphors ezimbalwa nama-simplifications yami. popular-science overview Thola i-dragon omncane eyenza esuka emkhakheni yayo. I-dragon iye uyakwazi ukufuthwa kanye nokuvuthwa kwamafutha - kodwa akuyona. It akufundisa kusuka kumadokhumenti, kodwa kusuka ku-experience - emzimbeni yokufika - ukubuyekeza ukuthi izinyathelo ziye zitholakala futhi ziye ziye zitholele. Indlela yokusabela Kuyinto isisindo se- : isakhiwo se-neural entsha esihlanganisa (njenge networks ezivamile) nge Phakathi ne-Inference BDH — the Brain-like Dragon Hatchling classic pretraining instant, self-directed learning Inethiwekhi ye-neuronal iyinkqubo ye-neurons ehlanganisiwe nge-"weights" ezihambelana ngokusebenzisa Ukunciphisa ama-error ngokushesha - njengomfundisi abalandela ngemva kwe-test ngokuvumelana nama-error. Kodwa-ke, uma i-test isekelwe, umfundisi akuyona - ukufundisa kwangaphambili, ngaphambi kwe-test. gradient descent Kuyinto indlela imodeli namhlanje ezifana ne-GPT isebenza: - Futhi bese stop. learn inside the egg Yini yenza i-Dragon Hatchling eyahlukile? I-BDH yenzelwe ngokugqithisileyo. It has izinhlobo ezimbili ze-memory: I-memory permanent, njenge-network ye-neuronal enhle - lokhu kungcono ngaphambi kokufunda. Ukuhlobisa okusheshayo, okufanayo nezinto, noma ukuxhumana okusheshayo phakathi kwezinhlamvu. Uma i-BDH isetshenziselwa ulwazi, ivela ukuxhumana ezintsha Uma izidakamizwa ezimbili zihlanganisa - ukuxhuma phakathi kwabo. Ngemuva kokufika Kuyaziwa ngokuthi i : Hebbian learning rule “I-Neurons That Fire Together, Wire Together.” “I-Neurons That Fire Together, Wire Together.” Lezi zihlanganisa zihlanganiswa ku-matrix eyahlukile , okuyinto isebenza njenge-mapping eside yayo esidlulile. Uma isimo efanayo esivela ngemva, i-BDH ibonisa: σ "Ah, ngaziwa lokhu ngaphambi - futhi lapha okufanayo." Yini ukuguqulwa nge-BDH? BDH ukuguqulwa inqubo yokufundisa. It ufundisa , ngaphandle kokusebenza backpropagation. It angakwazi ukuxhaswa ulwazi entsha , ngaphandle kokuqinisekisa noma ukucubungula kwe-GPU emikhulu. while it works Nge Go Ngokuvamile - BDH is a network that learns to live, not just to repeat. Ukufundisa ukujabulela, ukujabulela, futhi ukujabulela umoya Konke izakhiwo ezivamile zihlanganisa izigaba zayo zayo zayo. A dragon hatchling kuqala ukujabulela, ke ukujabulela izindandatho zayo, futhi ekugcineni ukujabulela umlilo. I-BDH model isitimela isitimela esifanayo - zonke izigaba zayo "ukuphila" inikeza uhlobo oluthile lokufunda. Stage 1: Standing (Classic Pretraining) This is where BDH learns, like any traditional neural network. It’s trained on data, adjusts weights via gradient descent, and minimizes loss — the familiar supervised learning phase. Think of it as the dragon strengthening its legs before taking the first flight. At this stage, the model is trained on a large dataset — text corpora, translations, and other examples. It uses standard , an optimizer like , and a that predicts the next token. offline backpropagation AdamW loss function During this process, BDH develops its , referred to as in the paper (the ). These correspond to what, in a transformer, would be parameters like , and so on. permanent weights “G” fixed ruleset Wq, Wk, Wv, W1, W2 Stage 2: Flying (Online Adaptation) Once training ends, most networks stop changing. But BDH keeps learning in real time. It has a — a fast-acting connection map that updates itself during inference. If certain neurons activate together, their connection grows stronger; if not, it weakens. This is how BDH adapts to new situations mid-flight, without retraining. Hebbian memory During inference — when BDH reads or generates text — it updates its , denoted as , or “synaptic weights.” temporary internal states σ(i, j) This process isn’t gradient descent. Instead, it follows a : local learning rule If neuron and neuron fire together → strengthen their connection σ(i, j). i j This simple rule implements — often summarized as Hebbian learning “neurons that fire together, wire together.” These updates are : they exist only while a dialogue or reasoning session is active. Once σ is reset, the model returns to its original “hatched” knowledge — the way it was trained before flight. short-lived Stage 3: Breathing Fire (Self-regulation) BDH doesn’t just strengthen all connections — it keeps them balanced. The model uses sparsity thresholds and normalization to prevent runaway feedback loops. It learns to "breathe fire" carefully — powerful, but controlled. Too much activation would lead to instability; too little would make it unresponsive. The balance between those extremes is what gives BDH its “life”. The paper briefly mentions an intriguing idea: if the are preserved and averaged over time, BDH could develop something resembling — a mechanism akin to slowly updating its core weights. However, the authors haven’t yet formalized the exact algorithm for this process. Hebbian updates (σ) long-term memory They suggest that: operates on short timescales — minutes or a few hundred tokens. Fast memory (σ) evolves over much longer periods — days or across model updates. Slow memory (G) This opens the door to — systems that can continuously acquire new knowledge without erasing what they already know. Unlike classic transformers, which suffer from , BDH hints at a future where models can lifelong learning catastrophic forgetting remember their past while growing into the future. I-Neuron I-Neuron Ukukhuthaza ukuxhumana kwabo σ(i, j). i j Why I Believe BDH Is A Evolution, Akukho Amamodeli Olandelayo Imininingwane akuyona kuphela isakhiwo - kuyoba a Kuyinto inikeza izinzuzo real, zibonakalayo. I-The Brain-like Dragon Hatchling (i-BDH) new direction in AI architecture I-AI enhle futhi enhle One of the biggest pain points in modern LLMs is Thina ngokuvamile ukwazi BDH ukuguqulwa ukuthi: "izakhiwo" zayo zihlanganisa ngqo izilinganiso zokusebenza zokusebenza. izixhumanisi ezihambisa njengoba imodeli "ukucindezela" mayelana ne-idea eyodwa. Iziphumo zayo zihlanganisa Waze (ngokunye emzimbeni), okwenza kube kungenziwa debug futhi ngisho . opacity Yini Waze sparse positive audit reasoning processes ➡️ Lokhu ivula ukunikezela I-AI emkhakheni ebalulekile - imithi, izimali, umthetho - lapho ukubuyekeza Umhlahlandlela owenziwe nge-model enesiphumela yayo kuyinto enhle kakhulu njengesiphumela yayo. Yini On-the-Fly Learning (Ukulungiswa kwe-Inference Time) BDH isicelo Ngaphezu kwalokho, ukuguqulwa kwama-neurons kungenzeka. . It adapts kubasebenzisi noma umklamo isikhathi real, ukuvelisa uhlobo okuyinto "ukubuyekeza" izakhi ezivamile ezivela ku-token ne-paragraphs. Hebbian learning without retraining short-term memory ➡️ Lokhu kususa i-LLM emangalisayo - amamodeli abavela ukuthuthukisa mid-conversation, indlela abantu, ngaphandle kokubili okwengeziwe. lifelong learning I-stable and scalable reasoning ngokushesha I-Transformers isixazululo — uma usuka ngaphandle kwe-context window yayo eqeqeshiwe, ukuxhaswa kwangaphakathi. BDH, kunjalo, yakhelwe njenge - Ukusebenzisana kwayo kubalulekile njengoba ububanzi lokucoca kanye nesisindo se-neurone. long-range reasoning scale-free system ➡️ Lokhu kubonisa ukuthi singakwazi ukwakha izinsuku noma ngisho izinsuku ezisebenzayo - ukwakhiwa, ukuhlola, noma ukucubungula - ngaphandle kokubuyekeza ukucubungula. agentic systems I-Models ye-Fusion ngaphandle kwe-catastrophic Forgetting I-BDH ibonise impahla elilodwa ebizwa ngokuthi : amamodeli amabili angakwazi "ukuguqulwa" ngokuvamile ngokuxhumana amagrafu zabo. Ngokungafani ne-transformers, lokhu akuyona ukusebenza noma inikeza ukuqeqeshwa okusha. model merging ➡️ Ungayifaka ama-models ezahlukile (bheka, imithi kanye ne-legal) ngaphandle kokufaka. ➡️ Lokhu kubula indlela , lapho i-reusable "plugins ye-neural" ingatholakala njenge-components ye-software. modular AI Ukusebenza & Ukusebenza BDH-GPU isebenza njengoba a , okungenani kungenziwa ukuqeqeshwa ngempumelelo usebenzisa i-PyTorch kanye ne-GPU. I-parameter yayo kanye ne-computing costs zihlala — hhayi exponentially njenge transformer enkulu. state-space system linearly ➡️ Lokhu kuvumela ukwakha amamodeli amakhulu ku- Ukwenza i-BDH ngokufanelekileyo kumadivayisi ebonakalayo kanye nama-startups. 10M–1B parameter range Ukuxhumana ne-Neuromorphic Computing Ngenxa yokuba BDH yakhelwe ngokwemvelo Waze , it is a perfect fit for chips like noma ukuthi emulate ama-biological networks ngqo ku-silicon. neurons synapses neuromorphic hardware Loihi TrueNorth ➡️ Lokhu ivula amathuba yokuhamba Ukusebenza kwamandla , platforms robotics, noma izinhlelo bio-inspired. large-scale reasoning models edge devices A Step Toward “Axiomatic AI” The authors introduce the idea of — izinhlelo ezithakazelisayo, okuyinto akuyona kuphela, kodwa Kuyinto efana nokufunda "i-thermodynamics ye-intelligence": izinsimbi ze-scaling ezokufikelelekayo kanye ne-stable dynamics yokuxhumana. Axiomatic AI formally predicted over time ➡️ Lezi zihlanganisa ku Ukusetyenziselwa ukusetshenziswa kwe- — ukusuka kwezimali kanye nezempilo kuya ku-transport. certifiable and safe AI architectures autonomous, high-stakes environments Ukwakhiwa kwe-Simple Neural Network Ukuze kwenziwe ngokwenene indlela BDH isebenza, ngifuna ukwakhiwa i-proof-of-concept encane — a , ifundwe ku-Classic XOR problem. It usebenzisa (A Rust Wrapper Okugcwele , the C++ core of PyTorch). This little project was inspired by the famous , kodwa imfuno yami akuyona ukunambitheka - kuyinto ukunambitheka. Ngingathanda ngokuvamile ukuthi izindlela BDH kungenziwa ukusebenza ngokuvamile. minimal “tiny-BDH” in Rust autograd via tch-rs libtorch “A Neural Network in 11 Lines of Python” I-source ephelele iyatholakala ku-GitHub repo Ngezansi, ngihambe nge-implementation ngama-step-by-step. Lokhu kungase kuhlukanisa, kodwa lokhu kuyimfuneko - le nceda lapha Ukuze wonke umngane mayelana BDH internals. ZhukMax/tiny_bdh_xor maximum transparency and accessibility ZhukMax/tiny_bdh_xor Ukubuyekezwa Uma lokhu isibonelo ibhalwe ku Thina ukuqala a ifayela — ifani elihlanganisa inkqubo kanye nezinhlangano zayo. Rust Cargo.toml I-dependence ebalulekile lapha , a safe Rust wrapper around the C++ library, okuyinto inikeza PyTorch. It inikeza thina ukufinyelela Ngena ngemva , nezinye izici ezisemthethweni ze-deep learning ngqo kusuka ku-Rust. tch libtorch tensors autograd Ngenxa BDH usebenzisa izici ezaziwayo ezifana Waze I-PyTorch iyatholakala ngokuvumelana ne-abstractions ezivamile ezivela ekusebenzeni. I-PyTorch iyatholakala ngokuvumelana ne-PyTorch. I-PyTorch iyatholakala ku-PyTorch. Ngaphandle kwe-BDH kwindlela elula kakhulu. Ukuhlobisa Ukuhlobisa learning logic Here’s the relevant snippet from : Cargo.toml [package] name = "tiny_bdh_xor" version = "0.1.0" edition = "2021" [dependencies] anyhow = "1.0.100" tch = { version = "0.22", features = ["download-libtorch"] } 💡 The feature tells Cargo to automatically fetch and link the correct binaries for your OS and architecture. Without it, you’d need to manually install PyTorch and set the environment variable. With it, everything “just works” — Cargo downloads and links the library during build. download-libtorch libtorch LIBTORCH I-The feature tells Cargo to automatically fetch and link the correct i-binary ye-OS yakho kanye ne-architecture. Ngaphandle kwalokho, uzodinga ukufaka i-PyTorch ngamanani futhi ukulawula i-PyTorch I-environment variable. Ngokuvamile, konke "isebenza kuphela" - I-Cargo ibhalisele kanye nokuxhumana kwebhizinisi ngesikhathi sokwakha. download-libtorch libtorch LIBTORCH may differ depending on your setup.) (Ukuhlaziywa: inguqulo esifanele tch - I-Core Of Our Tiny BDH ikhaya / Main.rs ikhaya / Main.rs Kwi-Projects ye-Rust, zonke amafayela zokufaka zihlala ngaphakathi umphathi. Njengoba lokhu kuyinto a , sibe konke ku-file eyodwa - Sishayele i-dependances ezidingekayo nokufaka indawo yokufaka: src minimal example main.rs use anyhow::Result; use tch::{nn, Device, Kind, Reduction, Tensor}; use tch::nn::{Init, OptimizerConfig}; fn main() -> Result { let dev = if tch::Cuda::is_available() { Device::Cuda(0) } else { Device::Cpu }; Ok(()) } Ukukhetha I-Device (i-CPU noma i-GPU) Ku-Line 6, sincoma to run the computations — on the GPU or CPU: where tch::Cuda::is_available() ukubuyekeza ukuthi CUDA ifakwe futhi kuvimbela ama-GPU ye-NVIDIA. Uma i-CUDA iyatholakala, ikhodi ukhethe i-GPU yokuqala: Device::Cuda(0). Uma i-CUDA ayikho (isib. ku-Mac noma i-CPU-only server), ithi ku-Device::Cpu ngokuvamile. Ukuhlobisa Ngemuva kwalokho, iyahlukaniswa kwezinye izingxenye ezifana so that zihlanganisa futhi zihlanganiswa ku-same device. dev VarStore::new(dev) all tensors Ukwakhiwa kwe-Training Data Ngemuva kwalokho, sincoma Waze tensors for our tiny XOR neural network — its training set: input output let x = Tensor::from_slice(&[ 0f32,0.,1., 0.,1.,1., 1.,0.,1., 1.,1.,1. ]).reshape([4,3]).to_device(dev); let y = Tensor::from_slice(&[0f32,1.,1.,0.]).reshape([4,1]).to_device(dev); Ukuqala nge-array efanelekayo ye-12 amasethi ( ), ukucacisa izibonelo ezine ze-XOR. Yonke triplet ye-number iyiphi isibonelo: 4 × 3 [0, 0, 1] [0, 1, 1] [1, 0, 1] [1, 1, 1] The first two values are binary inputs ( Waze ), futhi omunye onguchwepheshe inguqulo (ngamunye ), ukukhuthaza imodeli ukwahlukanisa idatha linearly. X₁ X₂ bias 1 Then converts this flat array into a matrix — amamodeli ezine, ngamunye nge izici ezintathu zokufinyelela. Okokuqala, Ukushintshwa kwe-tensor ku-device eyahlukile (i-GPU noma i-CPU), ukuqinisekisa ukuthi zonke izinga lokucoca kwelinye indawo. .reshape([4,3]) 4×3 .to_device(dev) Umthombo we-2 Tensile, Kuqukethe i Ukuze yonke inguqulo: y expected outputs [0], [1], [1], [0] Lezi zihambisana ne XOR truth table: X₁ X₂ Y 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 1 1 1 0 1 1 1 0 I-Hyperparameter ye-Network let n: i64 = 64; let d: i64 = 16; let u: f64 = 0.20; let hebb_lr: f64 = 0.01; let smax: f64 = 1.0; let sparsity_thresh: f64 = 5e-3; let lr: f64 = 5e-3; let steps = 3000; — the size of the (number of neurons in the layer). n = 64 neural field d = 16 — ububanzi obuningi we-range ye-matrices E no-D, ebonakalayo ukuthi idatha ivimbele futhi ivimbele. u = 0.20 - isivinini sokugqoka ye-speed memory σ; izinga okusezingeni eliphezulu zenza "ukugqoka" ngokukhawuleza. hebb_lr = 0.01 – isivinini sokufundiswa kwe-Hebrew updates – ukulawula ukuthi ukulayishwa ezintsha kubhalwe σ. Ku-BDH, ikheli lihlanganiswa nge-matrix eyodwa yokuxhumana I-Temporary . It doesn’t store the model’s learned weights (those are handled by gradient descent). Instead, it remembers Ukwakhiwa kwezinhlangano ezingenalutho - uhlobo le-"inkumbulo yokusebenza" esebenzayo ngesikhathi sokuxhumana. Hebbian Memory: σ (sigma) Imininingwane ye-Synaptic which neurons were active together Ukusebenza : smax = 1.0 — ukunciphisa amandla okuphakeme lokuxhumana σ, ukunciphisa izinga ezihlangene. — zeroes out very small σ elements, keeping the memory sparse and stable. sparsity_thresh = 5e-3 lr = 5e-3 - isivinini sokufundisa ye-Adam optimizer enikezela ama-parametres ezivamile ye-model (E, D, R_in, W_read). isinyathelo = 3000 - inombolo ye-training iterations (izikhathi ezingu-modeli ukubonisa idatha). Initializing Parameters and the “Neural Field” Ngemva kokufaka i-hyperparameter yethu, sinikeza a — a container that holds all trainable weights and biases of the network. Then we add the model’s learnable parameters — its “weights,” which will be updated during training: parameter store let vs = nn::VarStore::new(dev); let root = &vs.root(); let e = root.var("E", &[n,d], Init::Randn { mean: 0.0, stdev: 0.05 }); let dx = root.var("Dx", &[n,d], Init::Randn { mean: 0.0, stdev: 0.05 }); let dy = root.var("Dy", &[n,d], Init::Randn { mean: 0.0, stdev: 0.05 }); let r_in = root.var("R_in", &[3,n], Init::Randn { mean: 0.0, stdev: 0.20 }); let w_read = root.var("W_read", &[n,1], Init::Randn { mean: 0.0, stdev: 0.20 }); Konke variable iveza ingxenye ye-BDH model: r_in – umphumela we-projection ku-neural field. I-E, i-Dx, i-Dy — ama-transformations zangaphakathi, efana ne-weight of a hidden layer. Kodwa sicela: I-BDH ayikho izindandatho ngokuvamile — kuyinto kakhulu njengezindandatho esisodwa esihlanganisiwe ye-neurons. — the , used to read the network’s final activations. w_read output projection I-Optimizer ne-Fast Memory Ngemuva kwalokho, siphindeza , inguqulo esidumile ye-gradient descent enikezela ngokuzenzakalelayo izinga lokufundisa ngamaparametre. Thina kwenziwe i-tensor Ukubuyekezwa I-matrix ifakwe nge-zero. Lokhu kubonisa i-BDH Izinkampani ezivela Ukuxhumana phakathi kwe-neurons futhi iyahlaziywa ngalinye iminyango yokulungisa. Adam optimizer σ [n × n] fast Hebbian memory ikhaya let mut opt = nn::Adam::default().build(&vs, lr)?; let mut sigma = Tensor::zeros(&[n, n], (Kind::Float, dev)); for step in 0..steps { ... } Ngezansi le-training loop, siphinde i-code esifundisa i-"Dragon Hatchling" yethu ngenkathi i-Dragon Hatchling — okungenani, ngexesha le-offline pre-training. egg I-Forward Pass - Umhlahlandlela wokuqala we-Dragon I-code block elandelayo isebenza , isinyathelo esisodwa sokucwaninga lapho ingxubevange kubhalwe ku-output ( c) Izinzuzo forward pass logits let x_neu = x.matmul(&r_in); let y1 = relu_lowrank_forward(&x_neu, &e, &dx); let a = x_neu.matmul(&sigma.transpose(-1, -2)); let y2 = y1 + a; let z = relu_lowrank_forward(&y2, &e, &dy); let logits = z.matmul(&w_read); Ngiyazi okufanayo step by step: x_neu = x.matmul(&r_in) - idatha yokufaka kufinyelela kwelanga lwezinto. y1 = relu_lowrank_forward(...) - idatha ifakwe, ifakwe, futhi ifakwe nge-ReLU activation. a = x_neu.matmul(&sigma.T) — inikeza isignali eyengeziwe kusuka ku-Hebrew memory σ, ngokuvumelana nezinhlangano ze-neurone ezingenalutho. y2 = y1 + a - i-signal ye- "i-current" iyahlukaniswa ne-memory ye-short-term - lokhu kubalulekile kwegama le-BDH. z ne-logits - ukucubungula lokugqibela nokukhiqizwa kwe-output, ukuhlangabezana nezidakamizwa encane nezidakamizwa encane ze-model. The output Ukubuyekezwa kwe-A Ukwakhiwa kwe before activation — the dragon’s unrefined thoughts before taking shape. logits sigmoid raw predictions I-Low-Rank + I-ReLU Helper As promised, here’s the ReLU helper we use in the forward pass: /// y = ReLU( (x E) D^T ) fn relu_lowrank_forward(x: &Tensor, e: &Tensor, d: &Tensor) -> Tensor { let h = x.matmul(e); // [B,n]·[n,d] = [B,d] h.matmul(&d.transpose(-1, -2)).relu() // [B,d]·[d,n] = [B,n] } Kuyinto a Ngaphandle kwe-Big Dens Matrix , we factor it as Nge Ngena ngemva Ngena ngemva Ngathi low-rank linear layer with ReLU W ∈ R^{n×n} W ≈ E · Dᵀ E ∈ R^{n×d} D ∈ R^{n×d} d ≪ n Umqondo wokusebenza ngokucacileyo: . Project into a compact latent space of size Ukuze ama-demo amancane efana ne-XOR, lokhu kubalulekile kakhulu; ngoba ama-GPT-scale amamodeli, ama-memory savings zingatholakala. (I-terabytes ku-scale) don’t need all possible synapses d massive I-Line 3 ivimbele i-high-dimensional “neural field” (i-n features) ku-latent space ye-size d. Umugqa elilandelayo ukwandisa ku-n njenge-combination linear ye-decoder patterns kusuka ku-D. Ngokufanayo, lokhu kusebenza njenge-multiplication eyodwa we-W ≈ E · DT, kodwa isebenzisa ama-parameter angu-2 noma (n^2). Loss, Backprop, Step Now let’s add the standard — ukucubungula isisindo, ukusebenza backprop, ukucubungula isisindo: training step let loss = logits .binary_cross_entropy_with_logits:: (&y, None, None, Reduction::Mean); opt.zero_grad(); loss.backward(); opt.step(); Lezi zinyathelo ezine zihlanganisa : Ukubuyekeza umugqa, ukubuyekeza indlela yokubuyekeza imodeli, futhi isetshenziswe update. Ngemva ngamunye iteration, inethiwekhi isihlukanisa kancane ngaphezulu ukuguqulwa okwenziwe. heart of the training loop I-Hebrew Fast Memory Update (σ) Umphumela wokugqibela - futhi ngokuvamile core BDH twist - kuyinto Ukusuka Ukubuyekeza izinga ezingenalutho: Hebbian fast-memory update outside autograd tch::no_grad(|| { let bsz = x.size()[0] as f64; // 1) Build co-activation map: outer = y2ᵀ @ x_neu let outer = y2 .detach() // detach from autograd .transpose(-1, -2) // [B,n]ᵀ → [n,B] .matmul(&x_neu.detach()) // [n,B] @ [B,n] → [n,n] .to_kind(Kind::Float) * (hebb_lr / bsz); // scale by batch size and Hebb LR // 2) Work on a shallow copy to avoid move/borrow issues let zeros = Tensor::zeros_like(&sigma); let mut s = sigma.shallow_clone(); // 3) Exponential forgetting + add fresh co-activations s *= 1.0 - u; // older σ fades out s += &outer; // Hebbian boost for co-firing neurons // 4) Safety rails: clamp to prevent blow-ups // (I originally skipped this and hit runtime errors during training) s = s.clamp(-smax, smax); // 5) Sparsify: zero-out tiny values (efficiency + stability) let keep = s.abs().ge(sparsity_thresh); s = s.where_self(&keep, &zeros); // 6) Row-wise normalization: stabilize the energy of σ @ x let row_norm = s.square().sum_dim_intlist([1].as_ref(), true, Kind::Float).sqrt(); s = &s / &row_norm.clamp_min(1.0); // 7) Write back into σ without changing ownership sigma.copy_(&s); }); Thola lokhu njengoba BDH's : kusebenza ngokushesha ku-context yamanje (i-Hebbian), ngokushesha Ukuhlobisa ( Ukubuyekezwa (u-sparsity), futhi ivele ku-numerical (Ukuhlukanisa + Ukuhlukanisa) working memory forgets u compact stable Yini Sithuthukise Thina usebenzise inethiwekhi nge Izindlela zokufundisa ezaziwayo ezaziwayo embhedeni: two I-slow learning - i-backprop ye-classic enikeza izicathulo ze-permanent (E, D, R_in, W_read). Ukufundwa okusheshayo — Izindaba ze-hebraic ze-matrix ye-s during inference/training. Ngena ngemvume Umculo we-3 - – ngoba, njengoba abacwaningi zihlanganisa, akungabikho ngokugcwele. Ukuhlolwa kwalo mechanism kuyinto non-trivial futhi Ukubuyekezwa okuhlobene; ngisho isihloko sokucwaninga kuncike lokhu kuhlobonhlobo ephakeme. leave out transferring fast memory into long-term weights beyond the scope Indlela yokusebenza # 1) Create the project and add the files cargo new tiny_bdh_xor && cd tiny_bdh_xor # (replace Cargo.toml and src/main.rs with the code above) # 2) Build & run cargo run --release Njengoba kulandwa, ngemuva kwezinye amayunithi amayunithi amayunithi amayunithi amayunithi ( , ) and predicts XOR correctly. loss ↓ acc → 1.0 Ngena ngemvume ku-console To make the training dynamics and results easy to inspect, let’s add some lightweight logging. 1) Ukuthuthukiswa yonke 300 iminyango Ukushintshwa nokushintshwa ngesikhathi sokucwaninga: if step % 300 == 0 { let y_hat = logits.sigmoid(); let acc = y_hat.gt(0.5) .eq_tensor(&y.gt(0.5)) .to_kind(Kind::Float) .mean(Kind::Float) .double_value(&[]); println!("step {:4} loss {:.4} acc {:.2}", step, loss.double_value(&[]), acc); } 2) Imibuzo yokuqala Ngemva kokulungiselela, ukuphazamiseka izibonelelo ze-model: let x_neu = x.matmul(&r_in); let y1 = relu_lowrank_forward(&x_neu, &e, &dx); let a = x_neu.matmul(&sigma.transpose(-1, -2)); let y2 = y1 + a; let z = relu_lowrank_forward(&y2, &e, &dy); let preds = z.matmul(&w_read).sigmoid().gt(0.5).to_kind(Kind::Int64); println!("\nPred:\n{:?}", preds); 3) With vs. ngaphandle kwememori fast (σ) Ukuqhathanisa ukuhlaziywa lapho isithombe se-Hebrew is Waze : on off // σ = on let probs = z.matmul(&w_read).sigmoid(); println!("\nProbs (σ=on):"); probs.print(); println!("Preds (σ=on):"); preds.print(); // σ = off let y1_nos = relu_lowrank_forward(&x_neu, &e, &dx); let y2_nos = y1_nos; // no 'a' term from σ let z_nos = relu_lowrank_forward(&y2_nos, &e, &dy); let preds_nos = z_nos.matmul(&w_read).sigmoid().gt(0.5).to_kind(Kind::Int64); println!("\nPreds (σ=off):"); preds_nos.print(); Ukuze uthole ikhodi ephelele yokusebenza, bheka i-repository: https://github.com/ZhukMax/tiny_bdh_xor For a full working code, see the repository: https://github.com/ZhukMax/tiny_bdh_xor https://github.com/ZhukMax/tiny_bdh_xor Ukwakhiwa, Ukwakhiwa, Futhi Ukuhlolwa Imiphumela Umhlahlandlela ngokushesha, futhi ungakwazi ukubona ukuthi: I-probs (σ = on) iyatholakala kakhulu: [~0, 1, 1, ~0]. match — which is expected for XOR: it’s a static task solvable by the “slow” weights without fast memory. Preds (σ = off) Running `target/debug/tiny_bdh_xor` step 0 loss 0.6931 acc 0.50 step 300 loss 0.0000 acc 1.00 step 600 loss 0.0000 acc 1.00 step 900 loss 0.0000 acc 1.00 step 1200 loss 0.0000 acc 1.00 step 1500 loss 0.0000 acc 1.00 step 1800 loss 0.0000 acc 1.00 step 2100 loss 0.0000 acc 1.00 step 2400 loss 0.0000 acc 1.00 step 2700 loss 0.0000 acc 1.00 Pred: Tensor[[4, 1], Int64] Probs (σ=on): 7.4008e-09 1.0000e+00 1.0000e+00 6.6654e-17 [ CPUFloatType{4,1} ] Preds (σ=on): 0 1 1 0 [ CPULongType{4,1} ] Preds (σ=off): 0 1 1 0 [ CPULongType{4,1} ] Why σ Is Not “Neededed” for XOR XOR iyinhlangano elula ye-Boolean ukuthi inethiwekhi inokufunda nge Ukuhlobisa ( ). The Hebbian layer Shines lapho kukhona - izilinganiso, izilinganiso, "isikhathi esidlulile" - kungekho isampula eyodwa isizukulwane. slow E/Dx/Dy/R_in/W_read σ context over time Ungayifaka Yintoni Ukubonisa σ Pay Off I-Sequences (i-context memory): I-Predict the final symbol of a pair that appeared earlier in the same sequence (i-copy / i-associative recall). Izici ze-long-range: Izinqubo ze-balanced-parentheses - ukubuyekeza ukubuyekeza ngokuphathelene ngezinyathelo ze-20-100. I-on-the-fly adaptation: Phakathi ne-in inference, "ukushicilela isisekelo esitsha" (i-token pair) futhi ukubuyekeza ukuthi imodeli isetshenziselwa ngaphandle kwe-gradient updates. Compare convergence speed/quality with on harder prediction tasks. Log and watch how connections strengthen/decay over time. σ ablations: σ on/off nnz(σ) I-AI Incubator Is Near (Iziphumo) I-BDH ayikho kuphela i-”other alternative to transformers.” It is a glimpse into the next era of neural architectures – those that Ngaphandle kokuhamba ukulungiselela noma ukhangela i-terabytes ye-data, i-BDH ibhekwa Ngaphandle kwe-Real Time learn not on schedule, but in the moment of action during reasoning If transformers are like “students” who completed a course and earned their diploma, then BDH is a - ukukhululwa okusheshayo, ukuhlola emhlabeni, ukwenza imiphumela, ukuguqulwa, futhi ukunakekelwa konke okungenani esifundeni. dragon hatchling Ukulungiselela okuhle ku-AI ku-spirit yayo yokuqala: hhayi kuphela ukucubungula amathuba, kodwa . think within context and experience