Deep Learning may not deliver the AI revolution you have been led to expect. Is this season of hype simply a repeat of that of the late ’80s, just at a much bigger scale? Perhaps Winter is coming again for the Beatles-era technology, based as it is on the Neuroscience of WWII and the Statistical Mechanics of Victorian times. Fortunately, the renewed energy, enthusiasm, and investment in Deep Learning need not go to waste, if a new approach can fuse recent knowledge from Computational Neuroscience and Applied Mathematics with the power of today’s GPUs. The Feynman Machine is both an accurate description of how the brain really works, and a blueprint for Machine Intelligence. Combining recent discoveries in the Applied Maths of coupled, communicating, chaotic Dynamical Systems with those in Neuroscience, we formed Ogma a year ago to turn theory into working software and build a foundation for a new AI technology.
I won’t dwell too much on why I believe the Deep Learning boom of today is not the panacea it has been hyped up to appear. Many of DL’s own long-ignored and newly-lauded leaders are already doing so (Yoshua Bengio, Yann LeCun and Geoff Hinton most prominent among them), despite their recent high-profile appointments and astronomical financial underwriting. To varying degrees, they are now looking beyond the recent successful applications of DL, and seeking new ideas which might provide the next step forward.
The main weaknesses of DL (as I see them) are: reliance on the simplest possible model neurons (“cartoonish” as LeCun calls them); use of ideas from 19th century Statistical Mechanics and Statistics, which are the basis of energy functions and log-likelihood methods; and the combination of these in techniques like backprop and stochastic gradient descent, leading to a very limited regime of application (offline, mostly batched, supervised learning), requiring highly-talented practitioners (aka “Stochastic Graduate Descent”), large amounts of expensive labelled training data and computational power. While great for huge companies who can lure or buy the talent and deploy unlimited resources to gather data and crunch it, DL is simply neither accessible nor useful to the majority of us.
The Feynman Machine, on the other hand, uses learning from entire fields of study which simply didn’t exist before the 1960s and ’70s. For the first time, they are combined in a model which describes how the neocortex processes high-velocity, streaming sensory data and produces behaviour and cognition, and is also a novel architecture for intelligent machines. I’ll briefly describe these results and how they are used in the natural and artificial Feynman Machines.
The first thing to learn about is the branch of Applied Mathematics which examines the properties of Nonlinear Dynamical Systems (NDSs), known popularly as Chaos Theory. Most phenomena in science (both natural and human) can only be modelled accurately as NDSs, but this was not practical until computers became widely available in the early 1960s. The landmark 1963 paper by Edward Lorenz is widely considered to be the first detailed description of an entirely unknown class of behaviour: deterministic chaos. While literally impossible to deal with using traditional mathematical tools, systems involving NDSs have such rich structure, and are so pervasive in nature, that their study has dominated Applied Mathematics for over forty years.
In 1978, Packard et al discovered something unexpected about signals coming out of chaotic systems: they contain all the information needed to reconstruct the entire temporal behaviour of the chaotic system and predict its future, without any knowledge whatsoever of the actual mechanism driving the system. In 1981, Floris Takens proved his famous Theorem on this startling fact. The following video from George Sugihara’s lab briefly describes how the Theorem works and how reconstruction can be used to examine and analyse NDSs (here the Lorenz system is used for illustration):
The causality criterion described in the video has since become known as Sugihara Causality, and is much more powerful than the Granger Causality traditionally used in Statistics and machine learning, due to its use of the temporal structure in the time series signals. George Sugihara is one of the leading figures applying these powerful methods in areas as diverse as fishery sustainability and gene networks.
The central idea of the Feynman Machine is that regions of the brain form a network of NDSs which communicate and cooperate by sending one another time series of nerve signals, and cognition emerges from the causal interactions of the coupled Dynamical Systems. The Applied Mathematics tells us that this is possible, but we now need to look to Neuroscience to explain how exactly this might occur in real brains.
In 1986, future Palm founder Jeff Hawkins wrote a thesis proposal in which he describes a model of neocortex which is constantly predicting its own future evolution. Due to a lack of interest in Theoretical Neuroscience at the time, it was not until the early 2000s that Hawkins was able to return to this idea and devote his entire time to leading its development. He wrote On Intelligence with Sandra Blakeslee, formed the Redwood Center for Theoretical Neuroscience (now part of UC Berkeley), and co-founded Numenta, where his theory, now known as Hierarchical Temporal Memory, continues to be researched and developed. Hawkins (along with Numenta colleagues Subutai Ahmad and Yuwei Cui) bases all of his theory on hard Neuroscience, with nothing included unless there is strong evidence of it in the brain. In 2013, Numenta open-sourced their NuPIC HTM software, and I became part of the worldwide community which has joined Hawkins in exploring HTM as a cortical model and a promising machine learning technology. My 2015 paper was the first deep mathematical description of how HTM works.
That paper was the first of a pair which outlined the discovery I’d made in 2015, that the mechanisms of coupled NDSs were the secret source of computational power in the mammalian brain. The insight was triggered when I saw this talk by Melanie Mitchell of the Santa Fe Institute and took her courses on the Complexity Explorer MOOC website. I was too old to have been taught any of this in University, so this was the first time I had the chance to learn about the vast world of complex systems and the power of methods developed very recently to explore and exploit them.
My second paper — Symphony from Synapses: Neocortex as a Universal Dynamical Systems Modeller using Hierarchical Temporal Memory — outlines how the brain might act as a new kind of computing machine, using Takens’ Theorem to process streaming data, emergently self-organise cognition, and behave intelligently. HTM was used to demonstrate the direct connection between neurons in layers of neocortex and the emergent processing of networked NDSs. Since Numenta already had working software which emulates cortical computation in small regions, I proposed that intelligent machines could be feasible simply by building the right kind of network of such modules.
We now know (h/t George Sugihara) that there is strong evidence of the Feynman Machine process in real primate neocortex. Interestingly, this analysis used Takens’ Theorem and Sugihara’s methods to find the causal network across over a hundred brain regions (Tajima won this prize for the paper). Just weeks ago, the same team established a link between this kind of analysis and Tononi-Koch Information Integration Theory.
Around the same time in late 2015, Eric Laukien was developing high-performance GPU technology, loosely based on ideas we’d been discussing in the HTM community. In January 2016, Eric, our friend Richard Crowder, and I (with support from Eric’s father Marc) formed Ogma to develop a new technology by fusing the theory with Eric’s experience in GPU-powered machine learning. Ogma is named after Ogma mac Elathan, or Ogmios, a Gaelic demigod of learning and poetry, and the reputed inventor of Ireland’s first writing system, Ogham.
As predicted by the theory, a brain region is only one example of a wide variety of adaptively-learning NDSs which, when wired together appropriately, will self-organise and emergently co-operate to form a Feynman Machine. There is no need to mimic all the details of real neurons (as seen in Human Brain Project models), nor even the simpler abstraction of neurons, columns and layers in HTM. Similarly, traditional Deep Learning gave us some ideas for modelling and implementing high-performance, adaptive data transformations in parallel, but we could avoid many of its limitations too, since each module in the Feynman Machine is a semi-independent predictive learner, and learning is online, local and unsupervised.
The Feynman Machine is named in honour of our hero, Richard P. Feynman, for a number of reasons. A pioneer in computing, from Los Alamos to his involvement in early massively parallel algorithms in the 1980’s, he was a colleague of John von Neumann, now a synonym for the architecture of modern digital computers, though von Neumann’s designs were inspired by his understanding of cortical neurons. We hope that Feynman would appreciate the simplicity of these ideas, and might even have figured them out himself, had he lived to see recent progress in Neuroscience and Applied Mathematics. We recently discovered that his sister Joan, herself an eminent physicist, based one of her seminal papers on Takens’ Theorem. We might justifiably have called it the Feynman-Feynman Machine, since Richard may well have had this insight, if Joan had had the chance to explain the power of NDSs.
In any case, our paper was listed as among the MIT Technology Review Best of the Physics arXiv for the week of its publication, and we are presenting posters during March at the upcoming Neuro Inspired Computing Elements workshop and the AAAI Spring Symposium on the Science of Intelligence, organised by MIT’s Center for Brains, Minds and Machines (if you’re also attending, please drop by our poster).
I’ll explain the details of the Feynman Machine in my next post, but in the meantime here is a recent demo of it in action on the “Hello World” of NDSs, the Lorenz System:
The “noisy” Lorenz attractor (based on this recent paper) is used here because it’s much more challenging to learn than the vanilla NDS. There are a couple of important points about this demo which might not be obvious. First, the step number at top left is the number of data points the FM has seen since it was initialised randomly, so this is online learning, completely from scratch, with no pretraining. Second, the signal the FM is seeing is just the highly noisy observations, and never the smooth Takens trajectory displayed in most of the video, so the resemblance is due to the FM finding the true signal among the noisy data. Third, this represents the entire 8-layer network running at about 60 steps per second on my Macbook Pro, which includes drawing all the 3D graphics.
Stay tuned for my second post, which will describe the inner workings of the Feynman Machine.