Current, industry-led, interest in artificial intelligence is almost entirely focussed on data-driven AI. The reasons are easy to understand. Cheap data storage, fast processors and advancements in neural net algorithms and other data-centric techniques have made it possible to extract huge value out of data. We can build systems that can predict what will happen next based on what they’ve seen so far, very efficiently. Their performance is at times even better than that of a human being. The focus on data-driven AI is such that people have gone as far as labelling data-driven machine learning “the part of AI that works”
How true is that statement though? Is there really nothing that other parts of AI can offer us? Is there nothing coming out of all those decades of research in logics, planning, semantics or expert systems that can help us build a better autonomous system? As you probably have guessed, I think the answer is a resounding “no”. Furthermore, I think that unless we effectively combine data-driven AI with model-driven AI we cannot build successful AI systems.
Without appreciating what a more structured AI model brings to the table we risk re-inventing the wheel over and over again and missing out on very useful tooling. In order to effectively combine different AI tools though, we need ways of understanding and describing the architectures of autonomous systems — we need a software engineering paradigm specifically focussed on AI software.
Before we dive into the whys of of all of this let’s quickly make sure we are working of the same definitions of data-driven and model-driven AI.
For the purposes of this discussion, artificial intelligence refers to the tools, techniques and methodologies we use to automate processes. If you want a more nuanced understanding of what is artificial intelligence I’ve explained here how I use agent-based engineering concepts to derive a definition that is helpful for building systems.
In the context of process automation, we have, broadly, two ways of building systems that can understand what they are supposed to do next.
The data-driven way focusses on building a system that can identify what is the right answer based on having “seen” a large number of examples of question / answer pairs and “training” it to get to the right answer.
There are lots of different ways of doing this, with perhaps the most popular being using neural network algorithms in their various forms.
The necessary ingredients for this approach are an appropriately large dataset that, crucially, is also correctly labelled. No small feat. However, if you do get to the point where you have enough, correctly labelled, pictures of cats you can train software by “showing” it those images and letting it know when it “guessed” it correctly or not. After many (millions) of training cycles it will “learn” to get it increasingly right.
The strength of this approach is that it does not depend on a human accurately describing through a set of rules when something is a cat (e.g. it’s cute, it has a round face, it’s furry, it’s very popular in internet memes, etc). We now know, having tried for decades, that we are terrible at actually capturing in a set of explicit rules these sort of things. With data-driven AI the system learns “on its own” when something is a cat based on the training data we gave it. The more and more varied the training data the better our system can be.
Model-driven AI (or symbolic AI), instead, attempts to capture knowledge and derive decisions through explicit representation and rules. In a model-driven world, a cat would be explicitly represented as a four-legged animal, with two eyes, a nose and a mouth that is furry (except when not) and that is relatively small (except when not), etc. A model-based system would look at an image, deconstruct it into lines and shapes and colours and the compare against the set of rules we’ve supplied about how lines and shapes and colours combine in the world to give us different animals.
You can immediately see why this is not a very good way of building a system to recognise a cat. There are so many different rules and exceptions to those rules that we can’t capture all of them. More fundamentally, perhaps, we as humans don’t actually know how we do it. How can we build a system that does it explicitly if we can’t even describe what we do when we decide something is a cat.
These types of examples are why model-driven AI can easily get dismissed. It is not a good fit for many different situations. The question, however, is whether there are situations that are a good fit for explicit models and whether we can have systems that are purely model or data-driven or whether we gain more from combining the two. So let us explore that next.
Model-driven AI represents the attempt to capture our understanding of how the world works through explicit representation and rules. If the bit of the world we are trying to capture is “how you can identify a cat in a picture”, it is not a good fit. However, there are many domains that are highly codified, explicitly defined and can be captured in a model. Margaretta Colangelo puts it elegantly here where she discusses the value of small data as a complement to big data:
“Everything was small data before we had big data. The scientific discoveries of the 19th and 20th centuries were all made using small data. Darwin used small data. Physicists made all calculations by hand, thus exclusively using small data. And yet, they discovered the most beautiful and most fundamental laws of nature. Moreover, they compressed them into simple rules in the form of elegant equations. Einstein championed this with E=mc² . Although it’s estimated that perhaps 60% to 65% of the 100 biggest innovations of our time are really based on small data, current AI developments seem to focus mostly on big data, forgetting the value of observing small samples.”
If a model can be derived it can provide the most efficient path from question to answer.
Even when the model is not completely perfect it can still get us closer to an answer. Consider, for example, the problem of trying to build a conversational agent that is trying to help a user solve a problem in a specific domain — say the domain is “how to train your cat”.
Imagine that on the one hand, we have an NLP system that allows us to deconstruct what the user said, for which it often combines data-driven AI with explicit grammatical models of a language (model-driven AI). On the other hand, we have information collected from cat trainers about how one should go about training their cat in a number of different situations and for different types of cats. Let’s call the NLP system the generic layer and the knowledge we collected the domain layer. The generic layer is going to provide the “smarts” that our conversational agent uses to interpret what the user says and reduce it to a specific request for action (or intent). For example, the user might say something such as “I would love to to learn how to teach my cat to do tricks”. The NLP system can map that to a user intent such as “train cat to perform tricks”. With that intent available we now need to extract useful information from our domain layer around what tricks would be the most suitable for the owner in question and their cat. We need a model of that specific owner, their cat and then a set of rules that will allow us to come up with an appropriate training plan. How would we go about building such a model of our world?
While there is a lot of focus spent on the generic layer many existing systems follow relatively unstructured or “home-made” ways of dealing with domain models and reasoning over the domain layer. Instead, it is important that software engineers start thinking of what a complete AI system looks like and what are the other tools coming from AI research and related fields that can help. This is exactly where model-driven AI helps. For this example work from ontologies, inference and planning can all help us build a better system.
In a more general sense, we need to think about the entire system as made up from a variety of sub-systems. Some sub-systems will be data-centric while others will depend on explicit models. While current focus on the data-driven layer is understandable, a wider focus on the entire system is necessary. However, part of the challenge of combining such sub-systems and avoiding re-inventing the wheel its that we have no widely used software development paradigm that deals explicitly with autonomous systems.
We should be able to reason over the entire architecture we are using and be able to refer to a library of architectures and patterns that allow us to solve common problems efficiently. Just like object-oriented programming helps with dividing a problem into smaller pieces and provides us with patterns to deal with common issues, we need to better comprehend what it means to build intelligent systems and what are the common patterns we can refer to.
The good news is that while there is no widely used paradigm, that does not mean that such a paradigm does not exist. From within artificial intelligence research, agent-based software development can help us reason about the system as a whole and about the interactions between the various subsystems.
Agent-based software development deals with exactly these issues. It offers a variety of different architectures, both for individual agents and for systems with multiple agents interacting. One of my personal objectives moving forward is to talk about these different approaches, building on existing work, and share tools that can help us easily translate agent architectures into working software.
Artificial intelligence tools are here to stay. As the technology matures we need to start thinking about how our understanding of the software engineering aspects of it need to mature to allow us to capture and share knowledge and experience. There is a lot of existing work within artificial intelligence research that can help with this. The end result is that we better understand how intelligent systems can be built, using both data-driven and model-driven AI to solve problems.