AI Alignment Is Impossible, not just in practice but in theory

Matt Lutz at Persuasion:

Unfortunately, I’m pretty sure that AI alignment is impossible.

How might an AI form a moral sense? There are basically two scenarios. In one scenario, moral facts are the kind of fact that one might simply figure out by thinking about them hard. In such a case, perhaps AIs would be good moral reasoners, and indeed even better moral reasoners than humans, in virtue of their advanced intellectual capacities.

In the second scenario, moral facts aren’t the sorts of things we can figure out by pure intellectual effort, but we can nonetheless train AIs to develop a moral sense in much the same way we train children in good behavior: by rewarding them when they’re good and punishing them when they’re bad.

The first scenario is doomed, for reasons first pointed out by the philosopher David Hume in his oft-quoted (and oft-misunderstood) passage where he indicates that there is a gap (not Hume’s term) between “is” and “ought.” Hume thought that reasoning is not some sort of truth-generator, a special faculty that takes intellectual effort as an input and spits out knowledge as an output. Rather, it is a process, where we move from one thought to the next, with our later thoughts hopefully (though not necessarily) supported by our earlier thoughts.

But the process is fallible. After all, if we are to reason our way to a moral conclusion, we must be reasoning from non-moral conclusions. Taking that into account, what operation of the mind could possibly take us from premises that describe the world to conclusions that tell us how to act?

More here.

Enjoying the content on 3QD? Help keep us going by donating now.