TEDx talk on the topic of this post
The question: “Who are we?”: — How does our mind work, and how does it relate to the universe? — is the most fascinating riddle we humans can ask ourselves. I believe that the field of Artificial Intelligence can give us unique answers. Computer models allow us to understand minds as causal systems, and to test these models by building systems that begin to make sense of the world on their own.
One of 9 layers of the image recognition network presented by Andrew Ng’s group in 2012
Right now, we are in the middle of a technological revolution, known as “Deep Learning”. Working artificial neural networks have been around since the late 1950ies, and progress on them has been steady and slow. But in the spring of 2012, something extraordinary happened. A team of researchers from Google and Stanford University, lead by Andrew Ng, built a neural network running on 16000 computer cores, and trained it with 10 million randomly selected frames from Youtube. The network did not have any prior knowledge about what it was looking at, and did not receive any feedback on its results. The network was only looking for structure, for any kind of regularity. Ten million images — that is perhaps ten times the amount of data that a human baby gets to see in the first six months of his life. After three days of training, the researchers could show the system an arbitrary image from the database ImageNet, which contains 22000 different types or objects. In 15.8 percent of the cases, the network would guess the correct image, despite having had no human supervision during training. Its result was 70% better than anything that had existed before, better than any sophisticated handcrafted or learned image recognition software.
Overlay of patterns triggering the “cat neuron”
The system also received instant internet fame, because it could recognize images of cats with 75% accuracy. (This is not a very surprising result if it was trained using Youtube, of course.)
Deep Learning uses hierarchies of learning feature detectors, sometimes with hundreds of layers. Our systems are rapidly improving: Last year, they became actually better than humans in recognizing images from the ImageNet database, and they beat humans at playing PacMan. Earlier this year, a combination of Monte Carlo methods and Deep Learning beat humans at Go, a game that was thought to be out of reach of computers for quite some years to come.
And yet, despite these successes, there is a considerable distance between our learning computer systems and actual minds. You may have heard about the Turing Test, the task to build a system that makes a human think that it seems intelligent. But do you know how we will recognize that a system is truly intelligent? It will perform a Turing Test on you!
Having a mind means that we understand what it is to have a mind, and that we actively look for it in others. We look for understanding in others, including for awareness of who they are, and for how they look for that awareness in us.
How long is it going to take until we succeed in building artificial minds? Marvin Minsky, who founded the field with John McCarthy and others in 1953, gave one of the best estimates I know:
“I believed in realism, as summarized by John McCarthy’s comment to the effect that if we worked really hard, we’d have an intelligent system in from 4 to 400 years” — Marvin Minsky
Right from the start, the field fell apart in two distinct approaches: cognitive AI, which saw computer models as a way to understand and build minds, and narrow AI, which was an engineering discipline to build smarter data processing. Our pattern recognition systems are far from being minds.
Photograph by Lois Greenfield
Imagine that we are looking at a group of dancers in a ballroom, and we want to identify which one is Alyssa. Our systems do this by filtering out all the clutter that distract us from Alyssa’s identity: the pose of the dancer, the lighting, the dress, the particular facial expression. All these things can vary, while Alyssa still somehow stays the same. But we need to go beyond filtering: there is no clutter in the world, because everything is an interconnected feature, that gains significance in a different context. We need to go beyond the recognition of individual categories. We need to model a complete, dynamic world.
Our brains do that by creating hierarchies that start with low-level percepts: simple visual or acoustic patterns. We organize them into dynamic sensory motor scripts, into mental simulations. In our mind’s eye, we see moving objects, listen to sounds and voices, we observe people interacting, we imagine possible worlds. We map these simulations to concepts, which we can handle with logic and analytical thinking, and map them into natural language to synchronize our ideas with others.
Our mental representations of movements are produced by hierarchies of neural generators in the brain (motion capture visualization by Tobias Gremmler)
Minds are not classifiers, they are simulators and experiencers. The simulations and experiences are not part of our sensory data: these data are just electrical and chemical patterns that travel from nerve cells into our brain. The world that we experience is literally a creation of our mind, a dream that is anchored in the sparse and erratic impulses generated by our sensory nerves. This dream is the data structure that best predicts what impulses our nerves are going to discover next.
Early in my academic career, I worked in a group that taught robots to play soccer. These robots observed the patterns of data entering their computer brains from camera sensors, motion detectors and accelerometers, acting as their sensory nerves. From these data, they created a model of the playing field, the other robots and the ball. They used it to plan, coordinate and execute their actions, driven by behavior programs that made them push the ball into the goal.
When we traveled, we would often leave the robot bodies and the playing field at home, because they were heavy and bulky. Instead, we let the robots play in a simulation of their world, with simulated bodies, simulated physical interaction, simulated image generation. There was no way for the robot to notice any difference to the actual physical playing field, because the data that reached the robot’s computer brain had exactly the same structure, regardless of whether it originated in the physical world or in our simulator. Our robots operated in the same dream world regardless of whether they shared our physical world or were stuck in the matrix.
Ray Solomonoff
This principle that minds are generating a simulation world, a dream, from sensory data, be it the simple, unconscious minds of our soccer playing robots, or our human minds, is a consequence of Solomonoff induction.
In the 1960ies, Ray Solomonoff discovered the limit of what an information processing mind can know. He asked himself: what if a robot wakes up and realized that it is just a robot, an information processing system connected to an environment that gives him nothing but data, information, discernible differences? What is the best model of the world that the robot can come up with? Solomonoff induction says that the best the robot can do is to find the shortest program that among those that best predicts the next observation from all past ones, for all of your observations. In the perspective of Artificial Intelligence, all minds are such information processing systems, generating a simulation world, a dream, from sensory data. Our perception of the world, including our ideas about matter, energy, space, people and art is a dynamic dream that our brain uses to predict future observations from past ones.
Deep Learning has given us important insights on how neurons can take streams of signals and discover structure in them. Our minds can do things with these structures that are still very hard for computers: we can arrange them into a complete, dynamic world, and we rearrange them into possible worlds. This allows us to imagine, to remember, to be creative. (This is also something that most animals cannot do.)
Our nervous system is the control system of our organism. It starts out many simple feedback loops. Some of these reside in our brain stem, regulating our body temperature, heart rate and breathing patterns.
Often, feedback loops are not enough, and we need to change our interaction with the environment to keep our organism alive. Pain tells us to do less of what we currently do, pleasure says: do more of what you do now. Pleasure and pain let us choose actions and places. They reside in the mid-brain. They are also learning signals, so we can steer in the direction of future pleasure, and avoid future pain. To make that happen, they create connections between our needs and places in the world.
My own work is mostly concerned with understanding the structure of our needs, which give motivational relevance to what we perceive, experience, anticipate and decide.
A simplified model of the engine of motivation, showing physiological, cognitive and social needs, interacting to produce pleasure, pain, valence, arousal, attention.
Whenever we feel a need, we may call up situations to reduce it. Whenever we anticipate a place, we may get a feeling how it will benefit or harm us. Such associations between our needs and places can be stored in the hippocampus.
Mammals possess a neocortex that we can perhaps best understand as an extension of the hippocampus. Our neocortex is the largest part of our brain. It is where our organism creates the dream that it perceives as the world, including the story that it perceives as its own self.
Cortical columns
Our neocortex can do some things that we do not yet understand well enough to recreate them with machine learning. For instance, our thoughts and concepts and mental simulations are compositional. Like Lego bricks, we can fit them together in many possible ways, or rather, they learn how to self-organize to do that. The same neural elements can link up in different ways, to play out a walk in the park, a piece of music, a movie or a Dostoevsky novel in our mind. Many researchers think that the brain achieves this by organizing our neurons into cortical columns. Each column is a little circuit, about 2mm high and contains between 100 and 400 neurons. I think each of them learns to approximate their own little mathematical function, and is controlled by a state machine that controls how they link up and talk to each other. My current research explores how this can work.
The columns are organized into cortical areas. Each of those specializes on certain types of features, and when they link up, they generate a dynamical moving world. A typical cortical area has a few million cortical columns. Think of a cortical area as an instrument in the orchestra that plays the music of our mind, making sense of a part of the world, and coordinating a part of our behavior.
Some of the instruments are connected to our sensory input, from the retinas or from the spinal column. Some others have access to our motor control. Most are just listening to their neighbors, passing on their music, so that streams of processing form.
There is no way in which we can experience the cortical music as a whole, but in this cortical orchestra, there is a brain area that acts as a conductor. We think that it resides in the dorsolateral prefrontal cortex, and it has links into most of the other brain areas, so it can pay attention to what they are doing. The conductor does not have a mind of its own. It is just a specialized brain area, like the others, and it can only pay superficial attention to some of what a few of the other brain areas are doing at a time. Its role is similar to what a conductor does in a real orchestra: it attempts to control the instruments when they are out of tune, or when there are conflicts between them, and it determines what is being played tonight. The conductor provides executive function and feedback on the performance of the mind. Without the conductor, our brain can still perform most of its functions, but we are sleep walkers. A sleep walker may be able to open doors, leave the house, answer questions or even cook a meal, but there is nobody home: the actions of a sleep walker actions are incoherent and aimless.
Each process in our nervous system serves to regulate a part of the organism and its environment. The conductor regulates the function of the neocortex itself. In each moment, it will direct its attention to one or a few of the cortical instruments, while the others may continue to play unobserved, in the background. To learn and to reflect, the conductor will maintain a protocol of what it attended to. This protocol is a series of links to experiences generated by the other cortical instruments. It is the only place where our experience is integrated, where the separate parts of our mental models can be experienced together.
The most recent, fresh entries in the protocol of the conductor are our experience of the now. Consciousness may simply be the most recent memory of what our prefrontal cortex attended to. Conscious experience is not an experience of being in the world, or in an inner space. It is a reconstruction of a dream, anchored in the music played by more than fifty areas, made up from 86 Billion neurons, reflected in the protocol of a single region. I call this the conductor theory of consciousness. An interesting aspect of consciousness is that the conductor can direct attention on its own protocol, and thereby experience the experience of being conscious. In this way, our mind is not only telling itself a story about itself, it also listens to it.
My goal as an AI researcher is to build systems that are not just classifying data. I want to build systems that learn to construct a dynamic dream from these data, that find out that they are minds of their own, and that perform a Turing Test on you. I believe that building such systems is our best chance to find out who we are.