「The Dragon Hatchling Learns to Fly: Inside AI's Next Learning Revolution」

A Friendly Guide to the Brain-like Dragon Hatchling (BDH) トップページへ 現代のニューラルネットワークは、顔を認識し、ストーリーを書き、プログラミングインタビューを送ることができますが、それらはすべて同じ制限を共有しています。 . stop learning once deployed 数週間前、エンジニアや研究者のグループであるAdrian Kosowski、Przemysław Uznanski、Jan Chorowski、Zuzanna Stamirowska、Michał Bartoszkiewiczは、機械学習とニューラルアーキテクチャの分野における新しいアイデアを紹介する魅力的な論文を発表しました。 . new type of artificial neural network https://arxiv.org/abs/2509.26507?embedable=true 紙自体はかなり密集しています - 数学、公式、グラフでいっぱいです - しかし、大胆なアイデアでいっぱいです。 私自身のいくつかの比と単純化を含む。 popular-science overview 彼はすでに飛び立って火を吸う方法を知っているが、まだ知らない。 それは本から学ぶのではなく、飛行の真ん中に、どんな行動が役に立ったのか、どんな行動が役に立たなかったのかを覚える経験から学ぶ。 どう反応するか それがその本質である。 組み合わせる新しいニューラルアーキテクチャ (標準ネットワークと同様に) インテリアの時 BDH — the Brain-like Dragon Hatchling classic pretraining instant, self-directed learning ニューラルネットワークは、「重さ」によって接続されるニューロンのシステムです。 エラーを徐々に減らす - それは、テストが終わると、学生はもはや学びません - 学習はテストの前に、以前に行われました。 gradient descent GPTのような今日のモデルが働く方法:彼らは そして、その後止まります。 learn inside the egg ドラゴン・ハッチリングは何が違うの? BDHは少しスマートに設計されており、メモリには2種類あります。 いつものニューラルネットワークと同様に、永続的な記憶は、ハッチングの前に学んだものです。 一時的な記憶、本能に似たもの、または思考間の短期的なつながり。 BDHが情報を処理すると、新しい接続が生まれる。 二つのニューロンが一緒に活性化すると、それらの間のつながりが強まります。 ON THE FLY このことは、The : Hebbian learning rule 「共に火を燃やすニューロン、共に線を結ぶ」 「共に火を燃やすニューロン、共に線を結ぶ」 これらの接続は別々のマトリックスに保存されます。 最近起こったことの一時的な地図として機能する。 もし後で同様の状況が発生した場合、BDHはこう言います。 σ 「ああ、これ以前見たことあるけど、これが効いた」 BDHの変化とは? BDHは学習プロセスを変革します。 バックプロパガンダが実行されなくても、新しい情報に適応できます。 リトレーニングや重いGPUコンピューティングなし。 while it works on the go すなわち、 BDH is a network that learns to live, not just to repeat. 立つこと、飛ぶこと、そして火を吸うことを学ぶ どの生き物にも学びの段階がある。ドラゴンは最初に立つことを学び、その後翼を振り、ついに火を吸う。BDHモデルは似たような道を歩み、その「人生」の各段階は別の種類の学びをもたらします。 Stage 1: Standing (Classic Pretraining) This is where BDH learns, like any traditional neural network. It’s trained on data, adjusts weights via gradient descent, and minimizes loss — the familiar supervised learning phase. Think of it as the dragon strengthening its legs before taking the first flight. At this stage, the model is trained on a large dataset — text corpora, translations, and other examples. It uses standard , an optimizer like , and a that predicts the next token. offline backpropagation AdamW loss function During this process, BDH develops its , referred to as in the paper (the ). These correspond to what, in a transformer, would be parameters like , and so on. permanent weights “G” fixed ruleset Wq, Wk, Wv, W1, W2 Stage 2: Flying (Online Adaptation) Once training ends, most networks stop changing. But BDH keeps learning in real time. It has a — a fast-acting connection map that updates itself during inference. If certain neurons activate together, their connection grows stronger; if not, it weakens. This is how BDH adapts to new situations mid-flight, without retraining. Hebbian memory During inference — when BDH reads or generates text — it updates its , denoted as , or “synaptic weights.” temporary internal states σ(i, j) This process isn’t gradient descent. Instead, it follows a : local learning rule If neuron and neuron fire together → strengthen their connection σ(i, j). i j This simple rule implements — often summarized as Hebbian learning “neurons that fire together, wire together.” These updates are : they exist only while a dialogue or reasoning session is active. Once σ is reset, the model returns to its original “hatched” knowledge — the way it was trained before flight. short-lived Stage 3: Breathing Fire (Self-regulation) BDH doesn’t just strengthen all connections — it keeps them balanced. The model uses sparsity thresholds and normalization to prevent runaway feedback loops. It learns to "breathe fire" carefully — powerful, but controlled. Too much activation would lead to instability; too little would make it unresponsive. The balance between those extremes is what gives BDH its “life”. The paper briefly mentions an intriguing idea: if the are preserved and averaged over time, BDH could develop something resembling — a mechanism akin to slowly updating its core weights. However, the authors haven’t yet formalized the exact algorithm for this process. Hebbian updates (σ) long-term memory They suggest that: operates on short timescales — minutes or a few hundred tokens. Fast memory (σ) evolves over much longer periods — days or across model updates. Slow memory (G) This opens the door to — systems that can continuously acquire new knowledge without erasing what they already know. Unlike classic transformers, which suffer from , BDH hints at a future where models can lifelong learning catastrophic forgetting remember their past while growing into the future. もし神経 ニューロン つながりを強化する σ(i, j) i j なぜ私はBDHが進化ではなく、別のモデルであると信じているのか THE PAPER 単に理論的なものではなく、Aへ向かう。 それは、実際の、測定可能な利点を提供します。 「The Brain-Like Dragon Hatchling(BDH)」 new direction in AI architecture 透明で解釈可能なAI 現代のLLMの最大の痛みポイントの1つは、 めったに知らない モデルは特定の決定を下した。BDHはこれを変える:その「シナプス」は概念的な関係に直接対照する。 モデルが特定のアイデアについて「考える」ことによって、どのような接続が強化されるか。 そして (脳と同じように)、デバッグや . opacity なぜ 見る sparse positive audit reasoning processes ➡️これは、医学、金融、法律といった重要な分野における説明可能なAIの扉を開きます。 結論に達したモデルは、結論そのものと同じくらい重要です。 なぜ On-the-Fly Learning(インフェンス・タイム・学習) BDH適用 推論の過程でさえ、つまり、ニューロン間の接続は進化することができる。 それは、リアルタイムでユーザーまたは文脈に適応し、形態を開発します。 トークンや段落を通してアイデアを「覚えている」こと。 Hebbian learning リトレーニングなし short-term memory ➡️これはLLMをより近づける — models that keep improving mid-conversation, the way humans do, without any extra fine-tuning. lifelong learning 安定した、スケーラブルな時間の推理 トランスフォーマーと戦う トレーニングされたコンテキストウィンドウを超えると、一貫性が崩壊します。 その行動は、推論の深さとニューロン数が伸びるにつれて安定している。 long-range reasoning scale-free system ➡️それは、私たちが作ることができることを意味します。 数日、あるいは数週間、計画したり、研究したり、シミュレーションしたりすることによって、論理的な一貫性を失うことなく実行されます。 agentic systems 災害を忘れることなく融合するモデル BDHが独自の資産を導入 : 二つのモデルは、そのグラフを単に接続することによって「合併」することができる。トランスフォーマーとは異なり、これはパフォーマンスを悪化させず、再訓練を必要としません。 model merging ➡️ 異なるドメイン(例えば、医療および法的)からのモデルを組み合わせることができます。 再利用可能な「ニューラルプラグイン」は、ソフトウェアコンポーネントのように接続できます。 modular AI パフォーマンスと効率性 BDH-GPUはAとして機能します。 つまり、PyTorchとGPUを使用して効率的にトレーニングすることができるので、パラメータと計算コストは増加します。 — not exponentially like in large transformer stacks. state-space system 線形 ➡️これは、強力なモデルを構築することを可能にする BDHを独立した研究者とスタートアップの両方にアクセスできるようにします。 10M–1B parameter range ニューロモルフコンピューティング(Neuromorphic Computing) BDHは自然に定義されているので、 そして , it is a perfect fit for にぴったりです チップ像 または シリコンに直接生物ネットワークを模する。 neurons synapses neuromorphic hardware Loihi TrueNorth ➡️これが走る可能性を開く エネルギー効率 ロボットプラットフォーム、またはバイオインスピレーションシステム。 large-scale reasoning models edge devices 「Axiomatic AI」への一歩 著者らは、Idea of システムの行動は、観察できるだけでなく、 それは「知性の熱力学」を発見するのと同じです:予測可能なスケーリングの法則と安定した推論のダイナミクス。 Axiomatic AI formally predicted over time ➡️こちらの方向性 使用に適した、In 金融や医療から輸送まで。 certifiable and safe AI architectures autonomous, high-stakes environments シンプルなニューラルネットワークの構築 BDHがどのように機能するかを本当に理解するために、私は小さなコンセプトの証明書を構築することにしました。 クラシックXORの問題にトレーニングされた。 (Rust wrapper around) この小さなプロジェクトは、有名なPyTorchのC++コアにインスピレーションを与えられました。 しかし、私の目標は簡潔さではなく、明確さだったので、BDHのメカニズムが実際にどのように機能するかを深く理解したいと思いました。 minimal “tiny-BDH” in Rust autograd via tch-rs libtorch 「11行のPythonのニューラルネットワーク」 完全なソースコードは私のGitHubのレポで利用できます。 以下、私は実装のステップごとに歩きます。それは言葉の悪いように見えるかもしれませんが、それは意図的です - ここでの目標は BDHインターナショナルについて興味のある方へ トップ > トップ > トップ > トップ > トップ > maximum transparency and accessibility トップ > トップ > トップ > トップ > トップ > トムトム この例が書かれているので、 , we start with a file - プロジェクトとその依存性を定義する宣言。 Rust Cargo.toml ここで重要な依存性は、 , a safe Rust wrapper around the C++ ライブラリ、PyTorch をパワーアップします。 で、 , and other core features of deep learning directly from Rust. 直接Rustから深い学習の他のコア機能。 tch libtorch tensors autograd BDHは、よく知られている概念を用いて、 そして これらの既存の抽象を再利用するのではなく、それらをゼロから再実装することに意味があります。私たちの目標はPyTorchを再構築することではなく、それを探索することです。 なるべくシンプルな形でBDHを背負う。 ニューロン シナプス learning logic Here is the relevant snippet from 関連記事 : Cargo.toml [package] name = "tiny_bdh_xor" version = "0.1.0" edition = "2021" [dependencies] anyhow = "1.0.100" tch = { version = "0.22", features = ["download-libtorch"] } 💡 ダウンロード-libtorch 機能により、Cargo はオペレーティングシステムおよびアーキテクチャのための正しい libtorch バイナリを自動的に取得し、リンクするようになります。それがないと、PyTorch を手動でインストールして LIBTORCH 環境変数を設定する必要があります。 The 機能は、Cargo に自動的に正しいファイルをリハーサルしてリンクするように指示します。 オペレーティングシステムとアーキテクチャのためのバイナリ。それがないと、PyTorchを手動でインストールし、設定する必要があります。 環境変数. これにより、すべてが「単に動作する」 — Cargo はビルド中にライブラリをダウンロードし、リンクします。 download-libtorch libtorch LIBTORCH 設定によって異なる場合があります)。 (注:正確なバージョン) tch 『The Core of Our Tiny BDH』 トップ / main.rs トップ / main.rs Rust プロジェクトでは、すべてのソース ファイルは、 マネージャー: Since this is a , we’ll keep everything in a single file — . Let’s import the necessary dependencies and set up the entry point: src minimal example main.rs use anyhow::Result; use tch::{nn, Device, Kind, Reduction, Tensor}; use tch::nn::{Init, OptimizerConfig}; fn main() -> Result { let dev = if tch::Cuda::is_available() { Device::Cuda(0) } else { Device::Cpu }; Ok(()) } Choosing the Device (CPU or GPU) 第6話 決める 計算を実行するには、GPU または CPU で: where tch::Cuda::is_available() は CUDA がインストールされているかを確認し、NVIDIA GPU を検出します。 CUDA が利用可能である場合、コードは最初の GPU を選択します: Device::Cuda(0)。 CUDA が利用できない場合(たとえば、Mac や CPU 専用サーバーの場合)、デバイス::Cpu にデフォルトで設定されます。 変数 その後、他のコンポーネントなどに移行。 だからあの 同じデバイスで作成され、計算されます。 dev VarStore::new(dev) all tensors Creating the Training Data 次に、我々は定義する そして 私たちの小さなXORニューラルネットワークのトレーニングセット: input output let x = Tensor::from_slice(&[ 0f32,0.,1., 0.,1.,1., 1.,0.,1., 1.,1.,1. ]).reshape([4,3]).to_device(dev); let y = Tensor::from_slice(&[0f32,1.,1.,0.]).reshape([4,1]).to_device(dev); 12個の数字の平らな数値から始める( )の4つのXORサンプルを説明します. 各トリプル数は1つの例です。 4 × 3 [0, 0, 1] [0, 1, 1] [1, 0, 1] [1, 1, 1] 最初の2つの値はバイナリ入力( そして )、そして3つ目は恒久的です。 入力(常に) )は、モデルがデータを線形的に分離するのを助ける。 X₁ X₂ bias 1 その後 この flat array を a に変換します。 matrix - four samples, each with three input features. 最後に、 テンサーを選択したデバイス(GPUまたはCPU)に移動し、すべての計算が1つの場所で行われます。 .reshape([4,3]) 4×3 .to_device(dev) 2番目のテントは、 を含む。 それぞれの入力: y expected outputs [0], [1], [1], [0] これらは、XOR truth テーブルに匹敵する: X₁ X₂ Y 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 1 1 1 0 1 1 1 0 ネットワーク ハイパーパラメーター let n: i64 = 64; let d: i64 = 16; let u: f64 = 0.20; let hebb_lr: f64 = 0.01; let smax: f64 = 1.0; let sparsity_thresh: f64 = 5e-3; let lr: f64 = 5e-3; let steps = 3000; n = 64 - ニューラルフィールドの大きさ(層内のニューロン数)。 — the low-rank dimension for matrices and , defining how much the data is compressed and expanded. d = 16 E D u = 0.20 は、スピードメモリ σ の忘却率であり、より高い値により「忘れる」ことがより速くなります。 hebb_lr = 0.01 — the learning rate for Hebbian updates — controls how strongly new activations modify σ. ヘブライ語のアップデートの学習率は、新しいアクティベーションが σ をどの程度修正するかを制御します。 BDHでは、メモリは特別な接続マトリックスによって表される。 A 臨時 It does not store the model's learned weights (these are handled by gradient descent). 代わりに、それは覚えています。 , forming short-term associations — a kind of “working memory” active during inference. Hebbian Memory: σ (sigma) synaptic memory which neurons were active together 続く: — limits the maximum connection strength in σ, preventing runaway values. smax = 1.0 sparsity_thresh = 5e-3 - 非常に小さな σ 要素をゼロにし、メモリを希少で安定させます。 lr = 5e-3 - 通常のモデルパラメータ(E、D、R_in、W_read)を更新するアダム最適化の学習率。 — number of training iterations (how many times the model sees the data). steps = 3000 パラメータの初期化と「ニューラルフィールド」 ハイパーパラメータを定義した後、我々は — ネットワークのすべてのトレーニング可能な重量と偏見を保持するコンテナ. 次に、トレーニング中に更新されるモデルのトレーニング可能なパラメータ - その「重量」を追加します: parameter store let vs = nn::VarStore::new(dev); let root = &vs.root(); let e = root.var("E", &[n,d], Init::Randn { mean: 0.0, stdev: 0.05 }); let dx = root.var("Dx", &[n,d], Init::Randn { mean: 0.0, stdev: 0.05 }); let dy = root.var("Dy", &[n,d], Init::Randn { mean: 0.0, stdev: 0.05 }); let r_in = root.var("R_in", &[3,n], Init::Randn { mean: 0.0, stdev: 0.20 }); let w_read = root.var("W_read", &[n,1], Init::Randn { mean: 0.0, stdev: 0.20 }); 各変数は、BDHモデルの一部を定義します。 r_in - ニューラルフィールドへの入力投影。 E、Dx、Dy — 内部の変換は、隠された層の重さに類似します. しかし、覚えておいてください: BDHは通常の意味で層を持っていません。 w_read - ネットワークの最終アクティベーションを読み取るために使用される出力プロジェクション。 最適化と高速メモリ 次に、Iniciate the , 学習率をパラメータごとに自動的に調節するグレディエントダウンの一般的なバージョン. We also create a tensor A 広場 matrix filled with zeros. This represents BDH's どこの店 ニューロン間の接続は、トレーニングの各ステップで更新されます。 Adam optimizer σ [n × n] fast Hebbian memory 臨時 let mut opt = nn::Adam::default().build(&vs, lr)?; let mut sigma = Tensor::zeros(&[n, n], (Kind::Float, dev)); for step in 0..steps { ... } このトレーニングループの中で、私たちは「ドラゴン・ハッチリング」を教えるコードを追加します。 つまり、オフラインプレトレーニングの時。 egg Forward Pass - ドラゴンの最初の飛行 次のコードブロックは、 コンピュータは、インプットが出力に変換される主要な計算段階です( ) : forward pass logits let x_neu = x.matmul(&r_in); let y1 = relu_lowrank_forward(&x_neu, &e, &dx); let a = x_neu.matmul(&sigma.transpose(-1, -2)); let y2 = y1 + a; let z = relu_lowrank_forward(&y2, &e, &dy); let logits = z.matmul(&w_read); 以下は、ステップごとに起こるもの。 x_neu = x.matmul(&r_in) - 入力データがニューラルフィールドに入ります。 y1 = relu_lowrank_forward(...) - データは圧縮され、拡張され、ReLUアクティベーションを通じて送信されます。 a = x_neu.matmul(&sigma.T) - 臨時神経関連に基づいてヘブライ語メモリ σから追加信号を取得する。 y2 = y1 + a - 短期メモリと「現在」の信号を合併 - BDHのコアアイデアです。 zとlogits - モデルの短期および長期的な知識を組み合わせた最終的な処理および出力予測。 出力 まだAを通過していない。 彼らは代表する。 アクティベーションの前に - 形をとる前にドラゴンの未精製の思考。 logits sigmoid raw predictions Low-Rank + ReLU ヘルパー As promised, here’s the ReLU helper we use in the forward pass: /// y = ReLU( (x E) D^T ) fn relu_lowrank_forward(x: &Tensor, e: &Tensor, d: &Tensor) -> Tensor { let h = x.matmul(e); // [B,n]·[n,d] = [B,d] h.matmul(&d.transpose(-1, -2)).relu() // [B,d]·[d,n] = [B,n] } This is a . Instead of a big dense matrix , we factor it as 同 で、 , で。 low-rank linear layer with ReLU W ∈ R^{n×n} W ≈ E · Dᵀ E ∈ R^{n×d} D ∈ R^{n×d} d ≪ n アイデアはシンプル:あなた Project into a compact latent space of size コンパクトラテン・スペース , then project back. For tiny demos like XOR this is mostly illustrative; for GPT-scale models the memory savings can be (スケールでテラバイト) don’t need all possible synapses d massive Line 3 compresses the high-dimensional “neural field” ( features) into a space of size . n latent d The next line expands it back to as a linear combination of decoder patterns from . Together this acts like a single multiplication by , but uses 2nd parameters instead of . n D W ≈ E · Dᵀ (n^2) Loss, Backprop, Step 次に、標準を追加します。 損失を計算し、バックプロップを実行し、重量を更新します。 training step let loss = logits .binary_cross_entropy_with_logits:: (&y, None, None, Reduction::Mean); opt.zero_grad(); loss.backward(); opt.step(); この4つのラインは、 : measure error, compute how to fix the model, and apply the update. After each iteration, the network moves a little closer to the correct solution. heart of the training loop ヘブライ語 Fast Memory Update (σ) The last part — and really the core BDH twist — is the It RUNS and keeps values stable: Hebbian fast-memory update outside autograd tch::no_grad(|| { let bsz = x.size()[0] as f64; // 1) Build co-activation map: outer = y2ᵀ @ x_neu let outer = y2 .detach() // detach from autograd .transpose(-1, -2) // [B,n]ᵀ → [n,B] .matmul(&x_neu.detach()) // [n,B] @ [B,n] → [n,n] .to_kind(Kind::Float) * (hebb_lr / bsz); // scale by batch size and Hebb LR // 2) Work on a shallow copy to avoid move/borrow issues let zeros = Tensor::zeros_like(&sigma); let mut s = sigma.shallow_clone(); // 3) Exponential forgetting + add fresh co-activations s *= 1.0 - u; // older σ fades out s += &outer; // Hebbian boost for co-firing neurons // 4) Safety rails: clamp to prevent blow-ups // (I originally skipped this and hit runtime errors during training) s = s.clamp(-smax, smax); // 5) Sparsify: zero-out tiny values (efficiency + stability) let keep = s.abs().ge(sparsity_thresh); s = s.where_self(&keep, &zeros); // 6) Row-wise normalization: stabilize the energy of σ @ x let row_norm = s.square().sum_dim_intlist([1].as_ref(), true, Kind::Float).sqrt(); s = &s / &row_norm.clamp_min(1.0); // 7) Write back into σ without changing ownership sigma.copy_(&s); }); Think of this as BDH’s : it quickly adapts to the current context (Hebbian), gradually オリジナルタイトル( (残留) (スパルシティ)および数値的に残る (標準化+標準化) working memory forgets u compact stable 僕らが作ったもの We have implemented a network with the 論文で説明した学習方法: two Slow learning - 永続的な重量(E、D、R_in、W_read)を形作る古典的なバックプロップ。 Fast learning - 推論/トレーニング中に σ マトリックスのヘブライ語の更新。 わざと 第3楽章 - — because, as the authors note, it’s not fully specified yet. Designing that mechanism is nontrivial and この概要から;研究論文でさえ、この方向を高いレベルでスケッチするだけです。 leave out transferring fast memory into long-term weights beyond the scope How to Run It # 1) Create the project and add the files cargo new tiny_bdh_xor && cd tiny_bdh_xor # (replace Cargo.toml and src/main.rs with the code above) # 2) Build & run cargo run --release 数千歩を踏み出した後、ネットワークが融合する( で、 ) and predicts XOR correctly. loss ↓ acc → 1.0 コンソールにログインする トレーニングのダイナミクスと結果を簡単にチェックできるようにするには、軽量なロッジングを追加しましょう。 1)300ステップごとに進歩 トレーニング中に印刷損失と精度: if step % 300 == 0 { let y_hat = logits.sigmoid(); let acc = y_hat.gt(0.5) .eq_tensor(&y.gt(0.5)) .to_kind(Kind::Float) .mean(Kind::Float) .double_value(&[]); println!("step {:4} loss {:.4} acc {:.2}", step, loss.double_value(&[]), acc); } (2)最終予測 トレーニング後、モデルの予測を落とす: let x_neu = x.matmul(&r_in); let y1 = relu_lowrank_forward(&x_neu, &e, &dx); let a = x_neu.matmul(&sigma.transpose(-1, -2)); let y2 = y1 + a; let z = relu_lowrank_forward(&y2, &e, &dy); let preds = z.matmul(&w_read).sigmoid().gt(0.5).to_kind(Kind::Int64); println!("\nPred:\n{:?}", preds); 3) スピードメモリなし(σ) ヘブライ語のメモリが vs : on off // σ = on let probs = z.matmul(&w_read).sigmoid(); println!("\nProbs (σ=on):"); probs.print(); println!("Preds (σ=on):"); preds.print(); // σ = off let y1_nos = relu_lowrank_forward(&x_neu, &e, &dx); let y2_nos = y1_nos; // no 'a' term from σ let z_nos = relu_lowrank_forward(&y2_nos, &e, &dy); let preds_nos = z_nos.matmul(&w_read).sigmoid().gt(0.5).to_kind(Kind::Int64); println!("\nPreds (σ=off):"); preds_nos.print(); 完全な作業コードについては、リポジトリを参照してください: https://github.com/ZhukMax/tiny_bdh_xor 完全な作業コードについては、リポジトリを参照してください: https://github.com/ZhukMax/tiny_bdh_xor https://github.com/ZhukMax/tiny_bdh_xor 作成、トレーニング、予測結果 The model converges quickly, and you can see that: プロブ(σ = on)はほぼ完璧です: [~0, 1, 1, ~0]. Preds (σ = off) match - XOR で期待される:それは高速メモリなしの「遅い」重量で解決できる静的タスクです。 Running `target/debug/tiny_bdh_xor` step 0 loss 0.6931 acc 0.50 step 300 loss 0.0000 acc 1.00 step 600 loss 0.0000 acc 1.00 step 900 loss 0.0000 acc 1.00 step 1200 loss 0.0000 acc 1.00 step 1500 loss 0.0000 acc 1.00 step 1800 loss 0.0000 acc 1.00 step 2100 loss 0.0000 acc 1.00 step 2400 loss 0.0000 acc 1.00 step 2700 loss 0.0000 acc 1.00 Pred: Tensor[[4, 1], Int64] Probs (σ=on): 7.4008e-09 1.0000e+00 1.0000e+00 6.6654e-17 [ CPUFloatType{4,1} ] Preds (σ=on): 0 1 1 0 [ CPULongType{4,1} ] Preds (σ=off): 0 1 1 0 [ CPULongType{4,1} ] なぜ σ が XOR に「必要」でないのか XOR は、ネットワークが学ぶことができるシンプルなブール関数です。 パターン( The Hebbian Layer(ヘブライ語レイヤー) Shines When There Is — 序列、関連、「先に起こったこと」 — それぞれのサンプルが独立した場合ではありません。 slow E/Dx/Dy/R_in/W_read σ context over time What to Try Next to See σ 支払い Sequences(コンテキストメモリ):同じセクションで以前現れたカップルの最終シンボルを予測する(コピー/アソシエティブ・リコール)。 長距離依存:バランスのとれたパレンテースタスク - 20～100 段階のパレリングの正確性をチェックします。 During inference, “inject a new rule” (a token pair) and verify the model uses it . On-the-fly adaptation: without gradient updates σ ablations: より困難な予測タスクで convergence speed/quality と σ on/off を比較して、 nnz(σ) をログし、時間の経過とともに接続がどのように強化/崩壊するかを見てください。 The AI Incubator Is Near (Conclusions) BDHは単に「トランスフォーマーのもう一つの選択肢」ではありません。それは、ニューラルアーキテクチャの次世代への洞察です。 リトレーニングを待つか、またはテラバイトのデータを必要とする代わりに、BDHは自分自身を調整します。 リアルタイムで。 learn not on schedule, but in the moment of action during reasoning トランスフォーマーは、コースを完了し、学位を取得した「学生」のようなものなら、BDHは — freshly born, exploring the world, making mistakes, adapting, and remembering everything new it encounters. dragon hatchling この方向は、AIを元の精神に戻す:確率を計算するだけでなく、 . think within context and experience