The Ethics of Black Box AI

by Tim Sommers

My wife Stacey is irritated with the way Netflix’s machine learning algorithm makes recommendations. “I hate it,” she says. “Everything it recommends, I want to watch.”

On the other hand, I am quite happy with Spotify’s AI. Not only does it do pretty well at introducing me to bands I like, but also the longer I stay with it the more obscure the bands that it recommends become. So, for example, recently it took me from Charly Bliss (76k followers), to Lisa Prank (985 followers), to Shari Elf (33 followers). I believe that I have a greater appreciation for bands that are more obscure because I am so cool. Others speculate that I follow more obscure bands because I think it makes me cool while, in fact, it shows I am actually uncool. Whatever it is, Spotify tracks it. The important bit is that it doesn’t just take me to more and more obscure bands. That would be too easy. It takes me more and more obscure bands that I like. Hence, it successfully tracks my coolness/uncoolness.

The proliferation of AI “recommenders” seems relatively innocuous to me – although not to everyone. Some people worry about losing the line between when they just like what the AI recommends to them, and when they adapt to like what the AI says they should like. But that just means the AI is part of their circle of friends now, right? It’s the proliferation of AIs into more fraught kinds of decision-making that I worry about.

AIs are used to decide who gets a job interview, who gets granted parole, who gets most heavily policed, and who gets a new home loan. Yet there’s evidence that these AIs are systematically biased. For example, there is evidence that a widely-used system designed to predict whether offenders are likely to reoffend or commit future acts of violence – and, hence, to set bail, determine sentences, and set parole – exhibits racial bias. So, too several AIs designed to predict crime ahead of time, to guide policing (a pretty Philip K. Dickian idea already). Amazon discovered, for themselves, that their hiring algorithm was sexist. Sexist, racists, anti-LGBTQA+. anti-Semitic, and anti-Muslin language is endemic among large-language models.

One problem in dealing with this is the opacity of these AIs. One side of that problem is that these are almost all proprietary. That problem seems more tractable to me – maybe, just regulation, maybe, just limits on keeping algorithms proprietary (especially if they are going to be used, for example, by the police).

The more deeply problematic kind of opacity is the increasing reliance on black-box AIs – especially, so-called “deep-learning models.” As you probably already get, these are opaque in that even the people who make them have no idea what is going on inside them when they work – or, for that matter, don’t. Programmers build a mulita-layered architecture that they then expose it to colossal amounts of data, which are processed again and again looking for patterns. This has led to quantum leaps forward in computer vision, speech recognition, translation, drug design, climate modeling, and (most recently) understanding how proteins are folded – nearly all two hundred million of them. So, who cares if these systems are black boxes?

Well, meet Newcomb’s black box.

Suppose you’re going to be on a game show where you are given a choice involving two boxes. One box is transparent and has $1,000 in it. The other is a black box that you can’t see into, but you are told that it either has a million dollars in it or nothing.

The show features a mysterious perfect predictor. If the perfect predictor predicts that you will take both boxes, then the black box contains nothing. If the predictor predicts you will only take the black box, then it has a million dollars in it. The thing is the predictor is perfect. It’s never been wrong before. What should you do?

Obviously, you should just take the black box that will, therefore, have a million dollars in it. But, no, wait, when it comes time to make your choice the money is already in the box – or it isn’t. There’s nothing that can change whether it is, or it isn’t, at that point. If you take both you get at least a $1,000 and maybe $1,001,000. Take both.

Game theorists say that what is going on here is that there are two opposed rational strategies that it is difficult to choose between. You could maximize your expected utility. Given the perfect predictor and the possibility of a million dollars, forget the thousand dollars and take only the black box. The standard way of putting the second possibility, the dominance strategy, is that you should choose the strategy that is always better. Choosing two boxes always yields at least a thousand dollars, where choosing one can lead to nothing. [For fans of Rawls, maximin (maximize the minimum) is the dominance strategy under conditions of complete uncertainty (hence, the difference principle – we should go for the society where the least well-off members are as well-off as possible).]

The most interesting question, however, is how does the perfect predictor work? You are not supposed to color outside the lines on thought experiments. The whole point is, ‘What would you do?’ given these fixed parameters. The trolley will kill one or five. You are not allowed to declare any of them pregnant – or famous. (Last semester I asked my class what they would do if the one person (as opposed to the five) that would be killed by the trolley in the trolley problem was Phoebe Bridgers’ – and learned a novel moral principle. Save Phoebe Bridgers at any cost.)

But this perfect predictor bit is hard to let go of. Indeed, the most important black box in Newcomb’s puzzle is not the one with the million dollars in it. It’s the one with the perfect predictor in it.

The most plausible way for the perfect predictor to work is just to cheat. If it’s just stage magic, then the magician can always make it come out so that the perfect predictor looks right. If that is the case, if you play along and take one box, you’ll get the million dollars. Or maybe the perfect predictor uses some sort of time travel. Then, I guess, to the extent that their time machine is reliable, just take the black box. If the perfect predictor is God, then take just the black box.

But the second most plausible answer (after cheating) is that the perfect predictor does something like run an unbelievably detailed simulation of you to predict your behavior. It still behooves you in that case, to learn to be a one boxer, then then just take the one box.

What I am saying is that if the perfect predictor is really a perfect predictor, then there’s no “paradox.” You should take the one box. I think that it only seems paradoxical because, on some level, we can’t really believe that the perfect predictor is, in fact, a perfect predictor. Why not? Because it isn’t. Or, rather, because there is no such thing as a perfect predictor. Any predictor can go wrong.

The better chess AIs can beat every human player every time. Who cares if we don’t know why the way they play, and it is so detectably different from the way humans play that it’s hard to cheat the best players with an AI assist without getting caught? But if you are setting the optimal length of someone’s imprisonment, flagging people as terrorists, or hiring or firing them, there is an obligation to give reasons. Certain decisions can’t be made by a black box, no matter how good its track record is.

I would argue (as Kant, Baier, and Rawls have argued) that justice requires publicity in a very specific sense. The moral rules and laws that we are subject to must also to be available to us ahead of time for us to regulate our behavior by, and afterward, to defend it when challenged. Morality depends not just on there being reasons for what we do, but on our being able to give reasons for what we do. More controversially, but (I think) correctly, at least the most fundamental rules must be clear and simple enough to be practically, and not just theoretically, available to us.

If that is right, then black box AIs should be excluded from use in a whole swath of domains. Maybe not music recommendations, chess, or protein folding, but in any domain in which we are owed moral or legal consideration, decisions cannot be made by black box AIs or other purported perfect predictors.

I don’t want to be too sanguine about how helpful the line I am drawing really is all on its own. Consider medical AIs. There are black box AIs that seem to be better at diagnosing diabetes than humans. Would you rely on one? What if the AI gives you a no-reason diagnosis that a doctor disputes by offering you reasons?

Here’s another important line. In some cases, diagnosing diabetes, for example, we expect that at some point the AI will be revealed to be objectively right or wrong. But in cases like determining prison sentences there isn’t necessarily ever a point where it becomes completely clear how well the AI is doing. Using a black-box AI where there is no fixed, external, objective standard of success to meet seems dangerous. I’ll never know for sure if my Spotify AI is making me cooler – or less so. And that’s okay. About policing or paroling AIs, however, we cannot afford to be so sanguine.

A surprising number of well-known people worry that AI might one day take over the world. I worry that it already has, and we just haven’t noticed yet.


On “publicity” see, Kant, “Perpetual Peace,” Appendix II; Kurt Baier in The Moral Point of View, Chapter 8; and Rawls, Justice as Fairness, A Restatement, Section 35.

On Phoebe Bridgers, if you have already heard “Emotional Motion Sickness” and “Kyoto,” check out “Dylan Thomas” (playing with Connor Oberst as Better Oblivion Community Center); from Shari Elf’s one and only release, I’m Forcing Goodness Upon You, check out “Ron’s Appliance” and “Jesus at the Hardware Store;” for Lisa Prank, see “Luv is Dumb” and “Baby, Let Me Write Yr Lines” off Adult Teen; and last, but definitely not least, from Charly Bliss, check out “Glitter” and “Percolator,” off Guppy and “Slingshot” on the Supermoon EP.

You may have noticed I don’t provide links for most of the AIs I mention. This is for two reasons. One is that there are multiple versions of most of these and I don’t want to single out one. Also, I don’t want to accuse anyone’s specific AI of anything without direct knowledge. Examples of AIs with any of the functions I have mentioned, and the various complaints about them, are easy enough to find/google.

Finally, here’s an objection worth addressing that I can’t here. In any legal system of any contemporary nation state, the laws are (arguably) already too complicated to satisfy publicity. Introducing AIs (black box or not), therefore, is beside the point. In fact, AI lawyers might become the most affordable kind very soon.