Francesco Corea


Machine Ethics and Artificial Moral Agents

How to design machines with ethically-significant behaviors

Image Credit: andruxevich/Shutterstock

There has been a lot of talk over the past months about AI being our best or worst invention ever. The chance of robots taking over and the following catastrophic sci-fi scenario makes the ethical and purposeful design of machines and algorithms not simply important but necessary.

But the problems do not end here. Incorporating ethical principles into our technology development process should not just be a way to prevent human race extinction but also a way to understand how to use the power coming from that technology responsibly.

This article does not want to be a guide for ethics for AI or setting the guidelines for building ethical technologies. It is simply a stream of consciousness on questions and problems I have been thinking and asking myself, and hopefully, it will stimulate some discussion.

Now, let’s go down the rabbit-hole…

Image Credit: phloxii/Shutterstock

I. Data and biases

The first problem everyone raises when speaking about ethics in AI is, of course, about data. Most of the data we produce (if we exclude the ones coming from observation of natural phenomena) are artificial creations of our minds and actions (e.g., stock prices, smartphone activity, etc.). As such, data inherit the same biases we have as humans.

First of all, what is a cognitive bias? The (maybe controversial) way I look at it is that a cognitive bias is a shortcut of our brain that translates into behaviors which required less energy and thought to be implemented. So, a bias is a good thing to me, at least in principle. The reason why it becomes a bad thing is that the external environment and our internal capacity to think do not proceed pari passu. Our brain gets trapped into heuristics and shortcuts which could have resulted into competitive advantages 100 years ago but is not that plastic to quickly adapt to the change of the external environment (I am not talking about a single brain but rather on a species level).

In other words, the systematic deviation from a standard of rationality or good judgment (this is how bias is defined in psychology) is nothing more for me than a simple evolutionary lag of our brain.

Why all this excursus? Well, because I think that most of the biases data embed comes from our own cognitive biases (at least for data resulting from human and not natural activities). There is, of course, another block of biases which stems from pure statistical reasons (the expected value is different from the true underlying estimated parameter). Kris Hammond of Narrative Science merged those two views and identified at least five different biases in AI. In his words:

  • Data-driven bias (bias that depends on the input data used);
  • Bias through interaction;
  • Similarity bias (it is simply the product of systems doing what they were designed to do);
  • Conflicting goals bias (systems designed for very specific business purposes end up having biases that are real but completely unforeseen);
  • Emergent bias (decisions made by systems aimed at personalization will end up creating bias “bubbles” around us).

But let’s go back to the problem. How would you solve the biased data issue then?

Simple solution: you can try to remove any data that could bias your engine ex-ante. Great solution, it will require some effort at the beginning, but it might be feasible.

However, let’s look at the problem from a different angle. I was educated as an economist, so allow me to start my argument with this statement: let’s assume we have the perfect dataset. It is not only omni-comprehensive but also clean, consistent and deep both longitudinally and temporally speaking.

Even in this case, we have no guarantee AI won’t learn the same bias autonomously as we did. In other words, removing biases by hand or by construction is not a guarantee of those biases to not come out again spontaneously.

We have no guarantee AI won’t learn the same bias autonomously as we did.

This possibility also raises another (philosophical) question: we are building this argument from the assumption that biases are bad (mostly). So let’s say the machines come up with a result we see as biased, and therefore we reset them and start again the analysis with new data. But the machines come up with a similarly ‘biased result’. Would we then be open to accepting that as true and revision what we consider to be biased?

This is basically a cultural and philosophical clash between two different species.

In other words, I believe that two of the reasons why embedding ethics into machine designing is extremely hard is that i) we don’t really know unanimously what ethics is, and ii) we should be open to admit that our values or ethics might not be completely right and that what we consider to be biased is not the exception but rather the norm.

Developing a (general) AI is making us think about those problems and it will change (if it hasn’t already started) our values system. And perhaps, who knows, we will end up learning something from machines’ ethics as well.

Image Credit: Notre Dame of Maryland University Online

II. Accountability and trust

Well, now you might think the previous one is a purely philosophical issue and that you probably shouldn’t care about it. But the other side of the matter is about how much you trust your algorithms. Let me give you a different perspective to practically looking at this problem.

Let’s assume you are a medical doctor and you use one of the many algorithms out there to help you diagnose a specific disease or to assist you in a patient treatment. In the 99.99% of the time the computer gets it right — and it never gets tired, it analyzed billions of records, it sees patterns that a human eye can’t perceive, we all know this story, right? But what if in the remaining o.o1% of the case your instinct tells you something opposite to the machine result and you end up to be right? What if you decide to follow the advice the machine spit out instead of yours and the patient dies? Who is liable in this case?

But even worse: let’s say in that case you follow your gut feeling (we know is not gut feeling though, but simply your ability to recognize at a glance something you know to be the right disease or treatment) and you save a patient. The following time (and patient), you have another conflict with the machine results but strong of the recent past experience (because of an hot-hand fallacy or an overconfidence bias) you think to be right again and decide to disregard what the artificial engine tells you. Then the patient dies. Who is liable now?

The question is quite delicate indeed and the scenarios in my head are:

a) a scenario where the doctor is only human with no machine assistance. The payoff here is that liability stay with him, he gets it right 70% of the time, but the things are quite clear and sometimes he gets right something extremely hard (the lucky guy out of 10,000 patients);

b) a scenario where a machine decides and gets it right 99.99% of the time. The negative side of it is an unfortunate patient out of 10,000 is going to die because of a machine error and the liability is not assigned to either the machine or the human;

c) a scenario the doctor is assisted but has the final call to decide whether to follow the advice. The payoff here is completely randomized and not clear to me at all.

As a former economist, I have been trained to be heartless and reason in terms of expected values and big numbers (basically a Utilitarian), therefore scenario b) looks the only possible to me because it saves the greatest number of people. But we all know is not that simple (and of course doesn’t feel right for the unlucky guy of our example): think about the case, for instance, of autonomous vehicles that lose controls and need to decide if killing the driver or five random pedestrians (the famous Trolley Problem). Based on that principles I’d save the pedestrians, right? But what about all those five are criminals and the driver is a pregnant woman? Does your judgement change in that case? And again, what if the vehicle could instantly use cameras and visual sensors to recognize pedestrians’ faces, connect to a central database and match them with health records finding out that they all have some type of terminal disease? You see, the line is blurring…

The final doubt that remains is then not simply about liability (and the choice between pure outcomes and ways to achieve them) but rather on trusting the algorithm (and I know that for someone who studied 12 years to become doctor might not be that easy to give that up). In fact, algorithm adversion is becoming a real problem for algorithms-assisted tasks and it looks that people want to have an (even if incredibly small) degree of control over algorithms (Dietvorst et al., 2015; 2016).

But above all: are we allowed to deviate from the advice we get from accurate algorithms? And if so, in what circumstances and to what extent?

Are we allowed to deviate from the advice we get from accurate algorithms?

If an AI would decide on the matter, it will also probably go for scenario b) but we as humans would like to find a compromise between those scenarios because we ‘ethically’ don’t feel any of those to be right. We can rephrase then this issue under the ‘alignment problem’ lens, which means that the goals and behaviors an AI have need to be aligned with human values — an AI needs to think as a human in certain cases (but of course the question here is how do you discriminate? And what’s the advantage of having an AI then? Let’s therefore simply stick to the traditional human activities).

In this situation, the work done by the Future of Life Institute with the Asilomar Principles becomes extremely relevant.

The alignment problem, in fact, also known as ‘King Midas problem’, arises from the idea that no matter how we tune our algorithms to achieve a specific objective, we are not able to specify and frame those objectives well enough to prevent the machines to pursue undesirable ways to reach them. Of course, a theoretically viable solution would be to let the machine maximizing for our true objective without setting it ex-ante, making therefore the algorithm itself free to observe us and understand what we really want (as a species and not as individuals, which might entail also the possibility of switching itself off if needed).

Sounds too good to be true? Well, maybe it is. I indeed totally agree with Nicholas Davis and Thomas Philbeck from WEF that in the Global Risks Report 2017 wrote:

“There are complications: humans are irrational, inconsistent, weak-willed, computationally limited and heterogeneous, all of which conspire to make learning about human values from human behaviour a difficult (and perhaps not totally desirable) enterprise”.

What the previous section implicitly suggested is that not all AI applications are the same and that error rates apply differently to different industries. Under this assumption, it might be hard to draw a line and design an accountability framework that does not penalize applications with weak impact (e.g., a recommendation engine) and at the same time do not underestimate the impact of other applications (e.g,., healthcare or AVs).

We might end up then designing multiple accountability frameworks to justify algorithmic decision-making and mitigate negative biases.

Certainly, the most straightforward solution to understand who owns the liability for a certain AI tool is thinking about the following threefold classification:

  • We should hold the AI system itself as responsible for any misbehavior (does it make any sense?);
  • We should hold the designers of the AI as responsible for the malfunctioning and bad outcome (but it might be hard because usually AI teams might count hundred of people and this preventative measure could discourage many from entering the field);
  • We should hold accountable the organization running the system (to me it sounds the most reasonable between the three options, but I am not sure about the implications of it. And then what company should be liable in the AI value chain? The final provider? The company who built the system in the first place? The consulting business which recommended it?).

There is not an easy answer and much more is required to tackle this issue, but I believe a good starting point has been provided by Sorelle Friedler and Nicholas Diakopoulos. They suggest to consider accountability through the lens of five core principles:

  • Responsibility: a person should be identified to deal with unexpected outcomes, not in terms of legal responsibility but rather as a single point of contact;
  • Explainability: a decision process should be explainable not technically but rather in an accessible form to anyone;
  • Accuracy: garbage in, garbage out is likely to be the most common reason for the lack of accuracy in a model. The data and error sources need then to be identified, logged, and benchmarked;
  • Auditability: third parties should be able to probe and review the behavior of an algorithm;
  • Fairness: algorithms should be evaluated for discriminatory effects.
Image Credit: mcmurryjulie/Pixabay

III. AI usage and the control problem

Everything we discussed so far was based on two implicit assumptions that we did not consider up to now: first, everyone is going to benefit from AI and everyone will be able and in the position to use it.

This might not be completely true though. Many of us will indirectly benefit from AI applications (e.g., in medicine, manufacturing, etc.) but we might live in the future in a world where only a handful of big companies drives the AI supply and offers fully functional AI services, which might not be affordable for everyone and above all not super partes.

AI democratization vs a centralized AI is a policy concern that we need to sort out today: if from one hand the former increases both the benefits and the rate of development but comes with all the risks associated with system collapse as well as malicious usages, the latter might be more safe but unbiased as well.

Should AI be centralized or for everyone?

The second hypothesis, instead, is that we will be forced to use AI with no choice whatsoever. This is not a light problem and we would need a higher degree of education on what AI is and can do for us to not be misled by other humans. If you remember the healthcare example we described earlier, this could be also a way to partially solve some problem in the accountability sphere. If the algorithm and the doctor have a contradictory opinion, you should be able to choose who to trust (and accepting the consequences of that choice).

The two hypothesis above described lead us to another problem in the AI domain, which is the Control Problem: if it is centralized, who will control an AI? And if not, how should it be regulated?

I wouldn’t be comfortable at all to empower any government or existing public entity with such a power. I might be slightly more favorable to a big tech company, but even this solution comes with more problems than advantages. We might then need a new impartial organization to decide how and when using an AI, but history teaches us we are not that good in forming mega impartial institutional players, especially when the stake is so high.

Regarding the AI decentralization instead, the regulation should be strict enough to deal with cases such as AI-to-AI conflicts (what happens when 2 AIs made by two different players conflict and give different outcomes?) or the ethical use of a certain tool (a few companies are starting their own AI ethics board) but not so strict to prevent research and development or full access to everyone.

I will conclude this section with a final question: I strongly believe there should be a sort of ‘red button’ to switch off our algorithms if we realize we cannot control it anymore. However, the question is who would you grant this power to?

Image Credit: TheDigitalWay

IV. AI safety and catastrophic risks

As soon as AI will become a commodity, it will be used maliciously as well. This is a virtual certainty. And the value alignment problem showed us that we might get in trouble due to a variety of different reasons: it might be because of misuses (misuse risks), because of some accident (accident risks), or it could be due to other risks.

But above all, no matter the risk we face, it looks that AI is dominated by some sort of exponential chaotic underlying structure and getting wrong even minor things could turn into catastrophic consequences. This is why is paramount to understand every minor nuance and solve them all without underestimating any potential risk.

Amodei et al. (2016) actually dug more into that and drafted a set of five different core problems in AI safety:

  1. Avoiding negative side effects;
  2. Avoiding reward hacking;
  3. Scalable oversight (respecting aspects of the objective that are too expensive to be frequently evaluated during training);
  4. Safe exploration (learning new strategies in a non-risky way);
  5. Robustness to distributional shift (can the machine adapt itself to different environments?).

This is a good categorization of AI risks but I’d like to add the interaction risk as fundamental as well, i.e., the way in which we interact with the machines. This relationship could be beneficial (see the Paradigm 37–78) but comes with several risks as well, as for instance the so-called dependence threat, which is a highly visceral dependence of human on smart machines.

A final food for thought: we are all advocating for full transparency of methods, data and algorithms used in the decision-making process. I would also invite you though to think that full transparency comes with the great risk of higher manipulation. I am not simply referring to cyber attacks or bad-intentioned activities, but more generally to the idea that once the rules of the game are clear and the processes reproducible, it is easier for anyone to hack the game itself.

Maybe companies will have specific departments in charge of influencing their own or their competitors’ algorithms, or there will exist companies with the only scope of altering data and final results. Just think about that…

Image Credit: Sergey Nivens/Shutterstock

Bonus Paragraph: 20 research groups on AI ethics and safety

There are plenty of research groups and initiatives both in academia and in the industry start thinking about the relevance of ethics and safety in AI. The most known ones are the following 20, in case you like to have a look at what the are doing:

Finally, Google has just announced the People+AI research (PAIR) initiative, which aims to advance the research and design of people-centric AI systems.

Image Credit: Zapp2Photo/Shutterstock


Absurd as it might seem, I believe ethics is a technical problem. Writing this post, I realized how much little I know and even understand about those topics. It is incredibly hard to have a clear view and approach on ethics in general, let’s not even think about the intersection of AI and technology. I didn’t even touch upon other questions that should keep AI experts up at night (e.g., unemployment, security, inequality, universal basic income, robot rights, social implications, etc.) but I will do in future posts (any feedback would be appreciated in the meantime).

I hope your brain is melting down as mine in this moment, but I hope some of the above arguments stimulated some thinking or ideas regarding new solutions to old problems.

I am not concerned about robots taking over or Skynet terminates us all, but rather of humans using improperly technologies and tools they don’t understand. I think that the sooner we clear up our mind around those subjects, the better it would be.


Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D. (2016). “Concrete Problems in AI Safety”. arXiv:1606.06565v2.

Dietvorst, B. J., Simmons, J. P., Massey, C. (2015). “Algorithm aversion: People erroneously avoid algorithms after seeing them err”. Journal of Experimental Psychology 144(1): 114–126.

Dietvorst, B. J., Simmons, J. P., Massey, C. (2016). “Overcoming Algorithm Aversion: People Will Use Imperfect Algorithms If They Can (Even Slightly) Modify Them”.Available at SSRN: or

— —

Follow me on Medium

Look at my other articles on AI and Machine Learning:

More by Francesco Corea

Topics of interest

More Related Stories