Why Coding Multi-Agent Systems is Hard

Author profile picture

@MichaelTrazziMichaël Trazzi

“The main lesson of thirty-five years of AI research is that the hard problems are easy and the easy problems are hard.” Pinker (1994), The Language Instinct

I thought programming Software agents to collect Treasures on a Graph would be a piece of cake. I was utterly wrong. Coding agents so they do not act foolishly turned out to be intrinsically difficult.

A Well-Defined Multi-Agent Setting

“An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through effectors.” Artificial Intelligence: A Modern Approach, Stuart Russell and Peter Norvig
Simulation of a multi-agent system collecting treasures using GraphStream Library

Here is a simple multi-agent problem. Let n agents move in a fully-connected graph in order to collect treasures. The agents are limited in their movements, perception, and communication. They can only observe and move to nodes directly connected to them and communicate with close-enough agents.

There are three kinds of agents : Explorers, Collectors, and Infinite-backpack agents. Explorers are doomed to explore the map because they are not allowed to pick treasures. The ones who can are the Collectors, but they cannot carry too many and must hand out their collected treasures to Infinite-backpack agents.

The agents have limited perception but can remember past observations. Every agent has its own representation of the world, its own graph, which is a sub-graph of the real graph. Their sub-graph is the memory of all the nodes they have visited, and the edges they have ever seen or taken. They must communicate this graph to the others, so they can all share a reconstruction from all the subgraphs.

JADE (Java Agent DEvelopement Framework) will be used to implement what is called “behaviors” (tutorials can be found here, or here). In this Multi-Agent System Framework, a behavior is a set of instructions an agent will execute. At each turn, every agent executes each of its behaviors sequentially.

Your goal: implement the agents’ behaviors to let them collect as many treasures as they can in a certain amount of time.

Seems easy, right?

(Note: this project is part of an introduction to Multi-Agent Systems (a course from ANDROIDE, a Master’s Degree in AI I am currently doing at UPMC). It was inspired by the survival horror game Hunt The Wumpus, and in the full version of the project, agents need to deal with a terrifying Wumpus wandering around).

Non-trivial Behaviors

Imagine two agents moving in opposite directions in a long hallway. There can be only one agent per node of the Graph, so they must coordinate their actions in order to not obstruct the way. A specific protocol must be implemented to take this scenario into account.

Conflict of agents in a simulation: MyExplorerAgent2 is blocking the two others


Agents have limited perception and different abilities. Thus, cooperation is sine qua non. When a conflict happens, a protocol to unblock the situation must happen. They must share their subgraphs, see who is closer to a highly-connected node and agree on who will move.

Explorer agents must also agree on who will explore which part of the unknown graph, to optimize their movements and prevent conflicts.

Information Exchanges

The process of exchanging information in a Multi-Agent setting to give each agent access to the global knowledge is called the gossip problem.


For instance, let’s suppose every agent from the set {1,2, … ,n} knows exactly one piece of information, called a secret. Then, one very simple protocol is to have agent 1 call 2,3, … , n and get to know their secrets. Then, when 1 knows all the secrets, he calls 2, … , n to tell them those secrets, and then everyone knows all the secrets. A total of n-1+n-1 = 2n-2 calls are made. Actually, the optimal solution needs 2n-4 calls, which is close to our simple algorithm.

However, in our problem, the total information is not fully known until all the nodes have been explored, which makes the algorithm slightly more complicated, because the total knowledge is dynamic (the more the agents explore the graph, the greater their total knowledge).

That’s where optimization compromises appear. What is the optimal number of messages that must be exchanged between n agents so they all know all secrets? More messages imply a better global knowledge and better coordination. Yet, with thousands of agents and millions of nodes, the cost of sending thousands of messages every millisecond is far from negligible and becomes a computational burden.

Asynchronous Communication

Communication between agents is asynchronous. Since the execution of the agents is distributed, then there is no global clock to synchronize the agents’ actions. Moreover, when exchanging information, every agent has a mailbox with messages from other agents, so the communication may be delayed. During this delay, one agent might move far away and never answer the original message.

Coalition Formation

Example of coalition formation : source

Certain goals are not achievable alone (i.e. lifting a heavy object). Therefore, agents might agree to form group of agents, called coalitions, to achieve common goals.

With three agents with three different necessary skills (exploring, collecting and accumulating), coalitions of at least three agents must be formed. Thus, a protocol for creating and updating coalitions must be implemented. A possibility is to use the Shapley value (surplus created by a coalition of agents) to determine which coalitions are the most valuable.

Even with a simple problem setting, several obstacles arise very quickly and the algorithmic complexity seems insurmountable. This is a recurrent phenomenon when trying to build AI algorithms which would behave in a human-like manner.

Building AI exhibiting simple behaviors is Hard

“It is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility.” Moravec (1988) Mind Children

If we replaced the agents with humans, I am confident that they would very quickly understand how to win at this game, communicating what they saw in the graph and forming coalitions to collect the most treasures. However, implementing rigid behavior rules for Intelligent agents proves to be surprisingly difficult.

This is Moravec’s paradox :

What is easy for humans is incredibly difficult for machines

When it comes to playing chess, AI achieves super-human performance. But for basic human behaviors, such as walking or coordinating actions to explore a map, Artificial Intelligence algorithms are surprisingly harder.

Garry Kasparov, the Chess Grandmaster, makes the following statement in Deep Thinking: it is not difficult for any sufficiently advanced algorithm to beat 20 top chess players in a simultaneous game. But no AI (in a robot) could walk around in a crowded bar and move the chess pieces by itself.

Source: DARPA robots falling down

Machine Learning works under very specific circumstances

But why don’t we use the latest Machine Learning (ML) algorithm to solve our problem?… you say. Well, ML-only algorithms can only be used for certain tasks.

Yes, Reinforcement Learning (RL) algorithms are all the rage and can solve astoundingly hard problems such as achieving superhuman level at Atari Games or at the game of Go. But those games are full-visibility games with small data inputs, which is not the case for our treasure-hunting problem, where the map is not fully visible at the beginning.

Source: Deep Reinforcement Learning Doesn’t Work Yet (Feb. 2018)

But isn’t OpenAI working on a multi-agent system to beat humans at Dota 2 in a 5 vs 5 setting using Machine Learning algorithms?... you say.

Yes, OpenAI has shown impressive results when beating the world champions at Dota 2 in a 1 vs 1 setting. But it was mostly due to their substantial computational power and was not a breakthrough in Artificial Intelligence.

Their goal is to win in a 5 vs 5 setting, using a dataset of 5.8 Million of games. So, they seem to be working on a Multi-Agent problem with a full Machine Learning approach (learning from human games), and appear to be missing a top-down approach to Multi-Agent Systems.

The agents do not infer, do not generalize. Pure ML can work for Individual agents or fully observable systems, but for Multi-Agent Systems in a not-fully-known world, a more generalizable approach must be taken.

We have no clue how to implement scalable behaviors

With only two agents walking in opposite directions in a hallway, we encountered an issue. Implementing a protocol to deal with this specific problem was possible.

But what about 100 agents on a 400 nodes map?

There is a gap between hardcoding features for a small number of agents and having a scalable and generalizable implementation of a Multi-Agent system.

What needs to be done

Specific Multi-Agent Protocols must be developed, through research, to solve those kinds of problems. Learning without prior knowledge won’t teach the agents how to communicate because the search space is too big. A pure data-driven approach won’t lead anywhere.


Implementing an algorithm capable of solving the treasure-hunting problem proved to be much more difficult than it seemed. Conceiving Multi-Agent Systems capable of solving simple problems is far from trivial. Machine Learning algorithms achieved great results in the last decade, but alone they cannot solve all Artificial Intelligence problems.

If this article was helpful to you, hold the 👏 button (up to 50 times) and become part of the 👏 gang. You can follow me on Medium and Twitter, or you could even subscribe to my personal newsletter if you’re crazy enough.

EDIT: To get your hands dirty and start coding, here is the library developed by Cédric Herpson to quickly try out this project.



The Noonification banner

Subscribe to get your daily round-up of top tech stories!