I-AI isebenza ngokushesha ku-the same walls Noma ndingathanda, futhi nje ngithanda ukuthi izitimela zabo zihlola izilwanyana zokuhlala at 60 mph. Apparently, not even Tesla - with its 1.4 Trillion valuation and army of PhDs - knows about this math. Ngiyaxolisa, lapha i-hits engcono kakhulu etholakalayo emhlabeni wonke ku-YouTube: The Tesla Self-Driving Blooper Reel: ikhaya I-car isithuthuthe ama-brakes ngenxa ye-shadow. Njengoba ngokuvamile, i-shadows kuyinto ingcindezi ye-# 1 yokhuseleko kwedolobha kwelanga le-21. - Phantom Ukuhlobisa Phantom Ukuhlobisa Phantom Ukuhlobisa ikhaya Ukuthatha izibambo ngesivinini ephelele, bese uye "OH SHIT A CURVE!" futhi ukhiphe mini-chicane emzimbeni, umkhumbi bonke, ngaphandle umlomo yakho. - I-Surprise Party Turn I-Surprise Party Turn I-Surprise Party Turn ikhaya Ukulawula izibuyekezo kakhulu ukuze ungathanda ukuthi imoto inesibuyekeza isixazululo. Olandelayo, olandelayo, Olandelayo, Olandelayo. It is not driving, it is Phakathi Highway - I-Seizure Shuffle Ukuhlobisa I-Seizure Shuffle I-Seizure Shuffle I-"Why Did It Do That?" - Yenza into efanelekayo kakhulu ukuthi ngisho abacwaningi be-AI abacwaningi abacwaningi abacwaningi kuphela "ukuguqulwa kwe-gradient, mhlawumbe." - Ukusebenza I-”Why Did It Do That?” I-”Why Did It Do That?” I-”Why Did It Do That?” “gradient descent, probably. “I-gradient descent, ngokuvamile. I-Curious about the hidden side of AI? Funda kabanzi ku-page ye-José Crespo, PhD. I-Fix That Nobody’s Using I-Tesla ingatholela lokhu - ngempumelelo - ngokusebenzisa i-second derivatives (i-Hessian-vector products, noma i-HVP ye-cold kids). Ngaphezu kwalokho, i-Google, i-Meta, i-OpenAI, futhi cishe wonke inkampani enezinhloko se-“AI Strategy” ye-PowerPoint. But they’re not. See the table below - notice a pattern? Wait - Lezi zimo ezahlukile, ke? Okungenani. Lezi zihlanganisi ezahlukene, kodwa isifo esifanayo. Wonke abasebenzisa imathematiki ukuthi angakwazi ukujabulela "Ukuhlola kanjani?" Yini indlela ukuba ngihambe but not “ ” How sharply is this about to change? but not “ ” Ukulungiselela kanjani ngokushesha? How sharply is this about to change? It’s like asking a GPS for directions but never checking if there’s a cliff ahead. I-Root Cause: I-Your Great-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand-Grand Ukubalwa Ukubalwa As said, in the case of Tesla what is happening is that their cars are reacting to what’s happening right now, not anticipating what’s about to happen. Ukusabela ukuthi akuyona. Ukusabela ukuthi akuyona. Kuyinto njenge-playing chess ngokuzihlanganisa kuphela kwebhizinisi yamanje - akukho ukwakhiwa, akukho isinyathelo, kuphela “I see a piece, I move a piece.” Chess players call this “beginner level.” Tesla calls it “Full Self-Driving.” Umdlali we-Chess akhiwa "izinga le-beginner." I-Tesla akhiwa "I-Full Self-Driving." Umkhakha we-Tesla, njengomunye wonke omunye eSilicon Valley, usebenzisa isilinganiso esisekelwe ku-19th century - i-mathematical equivalent yokusebenza kwe-Netflix ku-telegraph machine. Ngexesha elide, isixazululo ifakwe emkhakheni angu-60: . dual/jet numbers Ngingathanda ukuthi “wacko, exotic math” ukuthi akuyona kwi-university CS amaphrograms? Nokho, lezi algebras hyperreal-related (i-duals ne-jets) zibonisa i-derivatives ye-second (HVP) isizukulwane esithathile ngokusebenzisa isakhiwo esihle se-operators ezimbili se-first-order (JVP VJP). Nokho, lezi algebras hyperreal-related (i-duals ne-jets) zibonisa i-derivatives ye-second (HVP) isizukulwane esithathile ngokusebenzisa isakhiwo esihle se-operators ezimbili se-first-order (JVP VJP). Nokho, lezi algebras hyperreal-related (i-duals ne-jets) zibonisa i-derivatives ye-second (HVP) isizukulwane esithathile ngokusebenzisa isakhiwo esihle se-operators ezimbili se-first-order (JVP VJP). I-Hold Up - I-Are You Telling Me I- ukuthi "i-gold-standard" h-limit isilinganiso yenza isilinganiso, lapho izilinganiso ezimbili / izilinganiso zenza ezingenalutho ... ukuthi lokho engatholakala ngokucwaningo nge-h-limit isilinganiso ezivamile kakhulu i-Ivy-League izilinganiso zihlanganisa njengoba isilinganiso se-gold kuyinto ezingenalutho nge-dual / jet izilinganiso, okuyinto ingcindezi amaningi ama-curve-related imibuzo emzimbeni lethu yamanje AI? H-Limit Ukubalwa izindiza / Jet I-computational intractable nge-calculus ye-h-limit ezivamile trivial nge double / izibalo jet, Yini? Yini ngokuvamile. And it gets worse. I-Hyperreal Revolution: Umfundisi wakho we-Calculus awuboni lokhu I-calculus etholakalayo e-college - elilodwa enikezele nge-differential equations, i-optimization theory, ne-machine learning courses - . Kuyinto kuphela . isn’t wrong incomplete Kuyinto njenge-learning ye-arithmetic kodwa asikwazi ukuthi ukuguqulwa kuyinto kuphela ukuguqulwa okuqhubekayo. Uyakwazi ukwenza imathemikhali, kodwa uyafuna ngokufanelekileyo. Here’s the specific problem: Traditional calculus (the h-limit approach): f'(x) = lim[h→0] (f(x+h) - f(x)) / h Ukulungiselela derivatives as — which means: Izinzuzo Izinzuzo ✅ Mathematically rigorous ✅ Great for ukuhlolwa i-theorems 🔸 I-computational nightmare ye-nothing ngaphandle kwe-first derivatives I-computerly nightmare for anything ngaphandle kwe-first derivative : Yini? Ngenxa yokubala i-derivative yesibili, kufuneka ukuthatha isixazululo se-limit Yini? I-limit ye-limit f'(x+h) = lim[h'→0] (f(x+h+h') - f(x+h)) / h' But I-computer yenyewe inesibopho: f'(x+h) f'(x+h) = lim[h'→ 0] (f(x+h+h') - f(x+h)) / h' noma uye uye uye uye Izingubo ezimbini ukuthi usebenzise ezingenalutho, noma uxhumane ku- izinga okusezingeni eliphezulu kwezingane kanye nesisindo. Kwi-in-both , ngoko kuqala amahora amabili ( (NgoLwesihlanu) ku-true second derivative - wena Ngaphandle . So, summing up: nested limits (H, H′) higher-order stencils lose derivative structure JVP → VJP don’t compose rebuilding guesses carrying derivatives (H, H′) Ukuze i-derivative yesithathu? Ukusebenzisa i-stencils ye-high order. Three nested limits or noma either nest k layers Ukusebenzisa i-stencils ezingaphezu - , ukucindezeleka kulingana nezidingo ze-stencil, futhi ungenza , ngakho JVP→VJP akuyona ku-HVP ku-FD pipeline. For the k-th derivative: noma noise blows up as O(h^-k) lose derivative structure So your self-driving car keeps crashing against sun-set lit walls. Nge-GPT-5 i-approx. 1.8 trillion parameters? Computational impossibility. Umbhali we-Sharp uyazi: "Ukuye, uma sinazi umsebenzi f, asikwazi nje ukucubungula f' futhi f'' analytically? Ngaba ufuna omunye lokhu limit noma double nombhalo izinto?" f F” f’’ Great question! Here’s why that doesn’t work for neural networks: I-Problem: I-Neural Networks I-Black Boxes Uma uchofoze umsebenzi elula, ungakwazi ukucubungula i-derivatives ngokuphathelene: # Case Simple - I-analytic derivatives isebenza kahle f(x) = x2 + 3x + 5 f'(x) = 2x + 3 # Easy ukuguqulwa ngesandla f''(x) = 2 # Ngaphezu kwalokho But a neural network with 1.8 trillion parameters looks like this: u(x) = σ(W175·σ(W174·σ(...σ(W2·σ(W1·x))...))) Yini: - Yonke 'W' kuyinto i-matrix nge-billions ye-parameter - Wonke 'σ' iyisici ye-activation nonlinear - Kukhona amasethi amakhulu (GPT-style) - I-composition ifakwe ngokushesha ngexesha lokusebenza Uyakwazi ukubhalisa ifomu ye-analysis ye-f'(x) ngenxa yokuba: I-Function iyahlaziywa ngalinye ushiye ama-parameters (ngalinye isinyathelo se-training) 2. It is too big to express symbolically 3. It iqukethe ama-billions ye-nests compounds ## Why I-Traditional Calculus Ayikho lapha I-h-limit ye-formula ye-h-limit ye-h-limit: f''(x) = lim[h→0] (f'(x+h) - f'(x)) / h Kutholwe ku-Evaluation f' (x + h)`, okuyinto kubalulekile: f'(x+h) = lim[h'→0] (f(x+h+h') - f(x+h)) / h' And here’s the trap: Ngaba ungenza f' analytically (i-function iyinhlanganisela kakhulu) Ngiyazi ukuthi usebenzisa imibuzo enhle (i-h-limit) Ngoku kufuneka f'(x + h) ngenxa ye-derivative yesibili Ngiyaxolisa ukuthi usebenzisa umugqa olulodwa (ku-step size h’) Ukubuyekeza ku-use finite differences that Ukulungiselela ukufinyelela - . Result: errors compound catastrophically The skeptical reader might continue objecting: ”But can’t we use something like SymPy or Mathematica to compute derivatives symbolically?” Ngokwe-theory, iya. Ngokwe-practice, sinemibuzo efanayo. For a 1.8 trillion parameter model!: I-expression ye-symbolic ye-f' kuyinto elikhulu kunama-model ngokuvamile. Ukubuyekezwa ku-Computer It Will Take Years Ukubuyekezwa ku-akhawunti ye-akhawunti ye-akhawunti ye-akhawunti ye-akhawunti Ukunciphisa kungcono ku-computational intractable Ngaphezu kwalokho, inethiwekhi encane ye-3-layer nge-1000 ne-neurons ngalinye: Example: Symbolic lands in the . f' millions of terms I-symbolic f'' uhamba ku-billions of terms. Ukukhula kubandakanya ububanzi / ububanzi; ama-sub-expression ezivame kakhulu. Forget it. For hundred of layers? clear now? Sishayele ukusha yethu ye-Hyperreals ye-AI Computing futhi bheka ukuthi kungenzeka lapho ama-hyperreals afanayo nezinqubo ezifanayo: I-Dual/Jet Numbers Yenza Ngaphandle: Ukuhlukaniswa okuzenzakalelayo Dual numbers don’t use limits at all. Instead, they: Encode the differentiation rules in the arithmetic Ukubuyekeza f nge izibalo ezizodwa ezihlanganisa idivayisi info I-derivatives ifakwe nge-regle-following arithmetic k-jets carry truncated Taylor lanes up to order k (nilpotent ε^k+1=0), so higher-order derivatives in one pass. Jets generalize this. fall out Ngiyazi: Izinto ze-calculus (ukudluliselwa kwe-power, ukudluliselwa kwe-chain, njll) ziye zihlanganiswa nezinqubo ze-jet ye-arithmetic, akunakusetyenziswa ngempumelelo! Ngakho-ke uzothola zonke izinzuzo ze-analysis ngaphandle kokusebenzisa kwabo! Umthetho we-calculus (umthetho we-power, umthetho we-chain, njll) , hhayi kusetshenziswe ngempumelelo! Ngakho ungenza zonke izinzuzo ze-analytical solution ngaphandle kokusebenzisa kwabo! Here’s the key: built into the jet arithmetic operations Three Izinzuzo Basic Calculus with Symbolic Rule Application ( impractical at modern AI scale) Process: Ukubhalisa umsebenzi: f(x) = x3 Qaphela umthetho we-power: d/dx[xn] = n·xn−1 Ukusebenza okuzenzakalelayo: f’(x) = 3x2 Hlola amabili amaphrojekthi ngokulinganayo Kufuneka ukwakhiwa ukuphepha le-derivative expression - ukuchithwa kwe-memory ye-exponential. For neural networks: I-h-Limit Calculus: Ukuphakama kwe-Numerical Process: Choose a step size h (guesswork) Evaluate: (f(x+h) — f(x))/h Thola i-approximation nge-error Problems: Not exact (always has truncation or roundoff error) Ngingathanda ukudibanisa cleanly Ukuqhathanisa ngezimali ezingaphezulu I-Algebra ye-Dual/Jet Numbers: Ukubuyekezwa nge-Augmented Arithmetic (yokusebenza ngezinga le-AI yesimanje) Process: Ukwandisa inombolo uhlelo with ε where ε² = 0 using this arithmetic Evaluate f at (x + ε) I-derivatives ibonisa njengama- ε-coefficients ngokuvamile Ukwandisa inombolo uhlelo ε² = 0 Ngaphandle kwe-expression eyenziwe - nje ukuguqulwa ngexesha nge-number ezizodwa. Ukuguqulwa kwe-memory ye-linear. For neural networks: Indlela Yenza Ukusebenza: I-Binomial Magic nge Izinombolo ezimbini Let’s see as a toy example how the power rule emerge without applying any calculus: Example: compute derivative of f(x) = x³ Step 1: Evaluate at augmented input f(x + ε) = (x + ε)³ (combinatorics, not calculus) Step 2: Expand using binomial theorem (x + ε)³ = x³ + 3x²ε + 3xε² + ε³ (ε² = 0) Step 3: Apply nilpotent algebra = x³ + 3x²ε + 0 + 0 = x³ + 3x²ε Step 4: Read the dual number x³ + 3x²ε = (x³) + ε·(3x²) ↑ ↑ value derivative The derivative f’(x) = 3x² emerged through: Ukuphakama kweBinomial (Algebra) Ukulungiswa kwe-Nilpotent (ε2 = 0) Coefficient reading NOT through: ❌ Power rule application I-h-limit ye-formula Ukulinganiswa okuhlobene You don’t apply the power rule — . you let binomial expansion reveal it Uma ungasetshenziswa umthetho we-power - ungasetshenziselwa ukucindezeleka kwe-binomial. You don’t apply the power rule — . you let binomial expansion reveal it You let binomial ukwandisa it Why This Scales Uma I-Symbolic Differentiation Ayikho Symbolic Differentiation (Analytical): With AI working with neural networkd you must build expressions: I-Layer 1 I-derivative: Izinto ezili I-Layer 2 I-derivative: I-million terms (i-explosion ye-combinatorial) Izingubo ezingu-100: Usayizi we-expression zithuthukisa ngokushesha / ububanzi; ngisho ne-common-sub-expression ukunciphisa kuza ku-intractable yokwakha, ukugcina, noma ukunciphisa. Memory required: More than all atoms in the universe 👀 Dual Number Evaluation: Never builds expressions: Konke tensor esebenzayo ibhekwa umthamo + ε·derivative Imininingwane: 2× model base (u-k = 1) Noma 3× imodeli base nge Jets (u-k=2 nge-derivative yesibili) For GPT-5 (1.8T parameters): k=1: ~14.4 TB → 18.0 TB (eyenziwe ngokuphelele) k=2: ~14.4 TB → 21.6 TB (ngokufanele ku ~34 H100 node) ~14.4 TB → 18.0 TB (eyenziwe ngokuphelele) ~14.4 TB → 21.6 TB (ngokufanele ku ~34 H100 node) BUT WAIT — YOU’RE FLYING FIRST CLASS IN AI MATH And there’s still more. The algebra of dual/jet numbers lets you use (yup, if you want to do yourself a favor and write real AI that works, Ngathi composition of functions learn category theory now! Here’s your genius move: Nge-composition ye-functions, singakwazi ukuthola second derivatives for the price of a first derivative!! Ngena ngemvume Ukusebenzisa kuphela — otherwise structurally impossible with limit-based calculus. How? composition of functions In Plain English: Why Composition Fails With h-Limits Ngenxa : Traditional calculus can’t do JVP∘VJP = HVP gives you a number (an approximation of f’(x)·v) JVP via finite differences Inani le ayikho isakhiwo se-derivative ukuze i-VJP iyahlukanise with a new finite-difference approximation You must start over Izixhobo ziye zihlanganisa - zonke zihlanganisa isakhiwo esilandelayo Ngenxa : Dual numbers CAN do JVP∘VJP = HVP I-JVP nge-duals inikeza inombolo ye-dual (f(x), f'(x)·v) Okuzenzakalelayo okuqukethe isakhiwo se-derivative ku- ε-coefficient VJP kungenziwa okuhlobene ngqo ngokwelashwa njenge-input Izinkampani zokusebenza ngokwemvelo - zonke zihlanganisa izakhiwo nezidingo ezilandelayo Dual numbers are algebraically closed under composition. Imiphumela ye-Practical what the new paradigm can compute that the old one can’t: Why This Is The Key To Fixing AI Current AI (k=1 only): Ingaba ungacebisa: "Ukuhlela lapho kufanele uye?" Ngingathanda: "Ukuhlukanisa kanjani ngokushesha le nendlela?" Ukusabela: Ukusabela, ngaphandle kwe-anticipatory With composition (JVP∘VJP): Get second derivatives for 2× the cost of first derivatives Uyakwazi ukuhlaziywa kwama-curves, ukuhlaziywa kwama-trajectory Umphumela: One of many examples - I-Tesla ikakhulukazi ukucindezeleka kwe-phantom; I-AI ikakhulukazi ukucindezela. I-Tesla ikakhulukazi ukucindezeleka kwe-phantom; I-AI ikakhulukazi ukucindezeleka. I-Tesla ikakhulukazi ukucindezeleka kwe-phantom; I-AI ikakhulukazi ukucindezeleka. With explicit k=3 jets: Thola i-derivatives ye-third-party nge-cost ye-3x Ukubuyekeza ukuxhumana kwe-topological consistency (izibalo ze-winding) Umphumela: Imiphumela ye-AI eyenziwe Mathematically I-Functors + I-Composition Advantage And why Hyperreal Algebra Matters: Without it (finite differences): Each derivative order requires starting from scratch Izinzuzo zihlanganisa nge-nests ngamunye Akukho isakhiwo esebenzayo ukusetshenziswa With it (dual numbers): I-Higher-Order Derivatives = I-Compound Operations ye-Lower-Order Exact (within floating-point) Automatic (chain rule built into ε-arithmetic) Ngenxa yalokho: ✅ Ukuphakama kwe-Double Numbers kuya ku-hundred layers (i-memory ye-linear) ✅ Izinhlelo zokusebenza (JVPVJP = HVP ngokushesha) ✅ I-Order Higher ifakwe nge-Jet Numbers (k = 3, k = 4 enhle) Ngenxa yalokho: This is why: ✅ Dual numbers scale to hundred of layers (linear memory) ✅ Composition works (JVP∘VJP = HVP automatically) ✅ I-Order Higher ifakwe nge-Jet Numbers (k = 3, k = 4 enhle) And why: ❌ Symbolic differentiation explodes (exponential expressions) ❌ Finite differences can’t compose (no functoriality) ❌ h-limit methods break at higher orders (error compounds) SUMMING UP The entire AI industry is stuck at first-order optimization because: They learned calculus as h-limits (doesn’t scale) Zibonisa i-derivatives njenge-differences eyenziwe (hhayi i-composer) They never learned about Group Theory and Hyperreal Numbers (not in CS curricula) Meanwhile: Izinombolo ezimbili zibonisa izidakamizwa ze-algebraic objects (hhayi ama-approximations) Jets make higher orders (not exponential) linear in cost Functorial composition makes second derivatives (JVP∘VJP) cheap The math to fix Tesla’s phantom braking, OpenAI’s hallucinations, and Meta’s moderation chaos has been sitting in textbooks since 1960s. Ukukhangisa umuntu ukuxhuma amaphuzu phakathi: i-theorem ye-binomial (~400 iminyaka), i-algebra ye-nilpotent (~150 iminyaka), ne-functiontorial composition + i-hyperreals (~60 iminyaka). To the biggest unsolved problems in AI. Now you know what Silicon Valley doesn’t and see what they cannot. Qaphela: Kule nqakraza, "i-calculus yokuzonwabisa" inikeza isicelo se-h-limit (i-difference-h-limit) esetshenziselwa ekusebenzeni - chofoza i-h, i-approximate, i-repeat - ama-derivatives ze-analytic/symbolic. NOTE: In this article, “traditional calculus” means the finite-difference (h-limit) implementation used in practice — pick an h, approximate, repeat — not analytic/symbolic derivatives. Curious about the hidden side of AI? Discover more on the page of José Crespo, PhD. Umbala we-Tesla eyenziwe nge-wall, eyenziwe ngempumelelo nge-solstice - okuvula ngempumelelo ngu-human driver. Umbala owenziwe ngu-author nge-stable diffusion. Umbhali wahlanganiswa nge-stable diffusion. Featured image: Tesla crashing through a wall, partially lit by sunset - easily avoidable by a human driver.