I think the best way to understand all the things that AI is missing is to describe a single example situation that folds together a variety of cognitive abilities that humans typically take for granted. Contemporary AI and machine learning (ML) methods can address each ability in isolation (to varying degrees of quality), but integrating these abilities is still an elusive goal.
Imagine that you and your friends have just purchased a new board game — one of those complicated ones with an elaborate board, all sorts of pieces, decks of cards, and complicated rules. No one yet knows how to play the game, so you whip out the instruction booklet. Eventually you start playing. Some of you may make some mistakes, but after a few rounds, everyone is on the same page, and is able to at least attempt to win the game.
Image source: StarCraft: The Board Game — Brood War Expansion
What goes into the process of learning how to play this game?
There has been at least some progress in all of these sub-problems, but the current explosion of AI/ML is primarily a result of advances in pattern recognition. In some specific domains, artificial pattern recognition now outperforms humans. But there are all kinds of situations in which even pattern recognition fails. The ability of AI methods to recognize objects and sequences is not yet as robust as human pattern recognition.
Humans have the ability to create a variety of invariant representations. For example, visual patterns can be recognized from a variety of view angles, in the presence of occlusions, and in highly variable lighting situations. Our auditory pattern recognition skills may be even more impressive. Musical phrases can be recognized in the presence of noise as well as large shifts in tempo, pitch, timbre and rhythm*.
No doubt AI will steadily improve in this domain, but we don’t know if this improvement will be accompanied by an ability to generalize previously-learned representations in novel contexts.
No currently-existing AI game-player can parse a sentence like “This game is like Settlers of Catan, but in Space”. Language-parsing may be the most difficult aspect of AI. Humans can use language to acquire new information and new skills partly because we have a vast store of background knowledge about the world. Moreover, we can apply this background knowledge in exceptionally flexible and context-dependent ways, so we have a good sense of what is relevant and what is irrelevant.
Generalization and re-use of old knowledge are aspects of a wider ability: integration of multiple skills. It may be that our current approaches do not resemble biological intelligence sufficiently for large-scale integration to happen easily.
A well-known type of integration challenge goes by the name of the symbol grounding problem. This is the problem of how symbols (such as mathematical symbols or words in a language) relate to perceptual phenomena — sights, sounds, textures and so on**.
Roughly speaking, artificial methods are of two types: symbolic, and sub-symbolic. Symbolic methods are used in “classic” or “good old fashioned” AI. They can be very useful for deterministic rule-based situations like chess-playing (but we typically have to code up the rules in advance). Symbolic processing works well when humans do the symbol-grounding in advance. It is not so great at dealing directly with ‘raw’ inputs in the form of light, sound, texture and pressure.
At the other extreme we have sub-symbolic methods such as neural networks (of which deep learning networks are a type). These methods work with digitized versions of raw inputs — pixels, sound files and so on. Sub-symbolic methods are great for many forms of pattern recognition and classification, but we don’t have reliable methods of going from category labels to symbols that are manipulated in a rule-based fashion.
So in summary, the key to understanding the sheer scale of the artificial intelligence problem requires appreciating that intelligence consists of much more than pattern recognition. What is needed is the ability to link patterns in a bidirectional way with symbolic representations, so that linguistic and rule-based thinking can occur in embodied agents that interact with the real world in real time.
* For more on the concept of invariant representations, see the following:
Yohan John’s answer to What are some of the most important problems in computational neuroscience that might drastically affect our perception of the brain and its functioning? Do we have an idea of how to attack such problems?
This essay I wrote covers the general concept of invariance, which is also known as symmetry:
** For more on the concept of symbol-grounding, see this answer: