Imagine you needed surgery. What do you do to keep your doctor from cutting out and selling one of your kidneys (or from doing something equally unethical)? There seem to be three kinds of assurance you could get:
As I have observed and participated in discussions about creating a data science code of ethics, I have found general acknowledgement that, for data science at least, option #1 does not currently exist, and probably will not exist for a very long time, and perhaps never will exist. What astounds me is that, in the absence of that option, efforts to tackle the issue of data science ethics seem overwhelming focused on variations of option #3. This post set out, briefly, why option #3 is not only impractical but unethical, and slightly less briefly, why I believe option #2 is what we all should have been working on in the first place.
The “manifesto for data practices” (datapractices.org), was produced by a Data for Good Exchange, sponsored by Data for Democracy and Bloomberg, promoted by former U.S. Chief Data Scientist DJ Patil, and is currently maintained by data.world. I’ve not been shy about stating that I have major ethical concerns about the document.
To summarize my criticism: the manifesto asks us to identify ethical practitioners by taking those practitioners’ word for it. All you have to do is sign the document. Costless virtue signaling creates systemic risk: as unethical practitioners certainly have no qualms about claiming they behave ethically, and ethical practitioners can be ethical without making any such claims in the first place, any reward for cheap talk - no matter how small - creates ways for individual practitioners to build their reputation and otherwise benefit from something other than actually doing their jobs well. A costless ethical code makes it harder, not easier, to identify ethical practitioners.
The manifesto commits the exact type of ethical breach that the document itself was supposed to address: its creators turned out and its signatories have supported a minimum viable product without fully considering the downstream harm the product could cause. The document’s creators incurred no risk by creating and promoting it, so it should not be surprising that the product fails to live up to its own ethical standards. This illustrates Nassim Taleb’s principle of skin in the game:
“There is a difference between beliefs that are decorative and…those that map to action. There is no difference between them in words, except that the true difference reveals itself in risk taking, having something at stake, something one could lose in case one is wrong. … How much you truly ‘believe’ in something can only be manifested through what you are willing to risk for it.”
Stealing and selling someone’s organs is obviously unethical — egregiously so. But most of the problems used to justify the need for a data science ethical code are not the steal-your-kidney sort of problem. The creators of the COMPAS algorithm, as far as we know, did not intentionally design the thing to keep poor minorities in prison. Amazon didn’t intentionally design its algorithm to offer same-day delivery to only rich, white zip codes. The designers of a child-abuse detection algorithm didn’t intentionally conflate being poor with being neglectful. In other words, while all of the above examples are concerning because of their ethical consequences, the consequences themselves arose because of a failure of competency, not a failure of ethics.
These are all more of an accidentally-damage-your-kidney-without-realizing-it sort of problem. None of the principles enumerated in the manifesto for data practices would have kept those flawed algorithms from being deployed. If the creators of those tools had realized that they had built systematic bias into their products, they would have changed their products before deploying. Knowing sound ethical principles, and even believing in those principles with all your heart, does not make you automatically recognize a poor design choice. Ethical problems do not have an ethical fix. It’s simply the wrong target.
I laid out three options for addressing ethical risk at the start of this post. A variation of the legalistic option #1 addresses competency risk instead: it’s called insurance. That’s an expensive and involved way to deal with the problem. It should go without saying that the cheap-talk option #3 is, if anything, an even more ridiculous way to assess competence than it is to assess ethicality.
That leaves option #2 — getting a public assessment of a practitioner. That option is an effective way to mitigate both competency risk and ethical risk, if and only if we get people to vouch for specific types of things — things that incur more costs to the individual practitioner that that practitioner received in return.
Remember that the problem with a cheap-talk code of ethics is that it makes ethical and unethical practitioners look exactly the same: they can both sign the document, both advertise the fact that they signed it, both talk intelligently about the principles. That effect can be counteracted by a code that demands regular costly action. Proof of your adherence to a code of ethics shouldn’t be your signature. It should be your resume.
One of the major features of the original Hippocratic Oath was the requirement that a doctor teach anyone who honestly desired to learn the craft. If someone claimed to abide by the code, you didn’t need to ask them “are you an ethical doctor?”. You could ask them “when was the last time you taught someone the trade?” And the reason this was an effective filter was because the Hippocratic Oath wasn’t a code for teachers. It was a code for doctors. That meant you had to make time and space to teach others even while carrying out your regular professional responsibilities.
Nearly every part of the original Hippocratic Oath was this kind of enumeration of direct costs to the practitioner. People who adopted the oath promised to not accrue side benefits during the course of performing their professional duties. Doctors who took advantage of non-medical information shared by a patient or who developed side-business contacts among a patient’s household were not adhering to the oath. To be clear: there’s not anything inherently wrong with any of those things, just as there is nothing inherently wrong with not teaching. The costs mattered because only the truly competent doctors — the ones who knew and cared about their profession enough to be really very good at what they did — would be willing to consistently sacrifice their time of turn down a proffered side-benefit.
You have to be particularly competent to be able to voluntarily incur personal costs. And those personal costs deter those who would pretend to be competent solely in order to obtain side benefits. In other words, an ethical code that is also a method of assessing competence is an effective filter for intentionally unethical behavior as well as unintentionally unethical behavior. In fact, given a lack of legal, regulatory, and insurance infrastructure, it’s the only effective filter. And the only costs it incurs are paid directly by individual practitioners.
Ethics is not a solvable problem but it is a manageable risk. No set of principles, not even a robust legal and regulatory infrastructure, will ensure ethical outcomes. Our goal should be to ensure that algorithm design decisions are made by competent, ethical individuals — preferably, by groups of such individuals. If we improve competency, we improve ethics. Most ethical mistakes come from the inability to foresee consequences, not the inability to tell right from wrong.
An effective ethical code doesn’t need to — in fact, probably shouldn’t — focus on ethical issues. What matters most are the consequences, not the tools we use to bring those consequences about. As long as an ethical code stipulates ways individual practitioners can prove their competence by voluntarily taking on “unnecessary” costs and risks, it will weed out the less competent and the less ethical. That’s the list we should be building. That’s the product that will result in a more ethical profession.
I don’t know what that list should look like. For what it’s worth, I personally think two stipulations found in the original Hippocratic Oath could be easily adapted to another profession:
Those are costly stipulations. In the case of teaching, it’s an opportunity cost, in that you’re spending time helping others that you could have spent on your core responsibilities. In the case of advertising ignorance, it’s a more direct cost: I know how scary and sometimes even risky it can be to admit ignorance. That risk is what makes it a good ethical guideline.
I think a professional ethical code could be useful, perhaps even important. I think it’s possible to build a robust, enforceable ethical code. But the products offered so far are not the right way to go — in fact, they’re moving us away from where we need to be. If we’re honest about wanting to address ethical issues within the profession, we need to put skin in the game. Anything less than that will exacerbate the problem.