Escape From Brain Prison II: The Threat of Superintelligence

by Oliver Waters

The first article of this series argued that in principle we should expect our personal identities to survive a transition to artificial brains. Obviously there remains the minor technical problem of how to actually build such brains. The most popular conception of how to approach this is known as ‘whole brain emulation’, which entails replicating the precise functional architecture of our current brains with synthetic components.

This is a cautious approach, in that the closer the design of your new brain to your old one, the more confident you can be that every precious memory and personality trait has been copied over. However, this also means copying over all the limitations and flaws that evolution bestowed upon our biological brains. Yes, you won’t have to worry about your cells ageing anymore – a definite upside! But you will remain stuck with outdated structures like the amygdala and hippocampus as well as the convoluted circuitry linking them together. It would be like upgrading your 1950s classic car engine with all new modern parts. It’s still the same, inefficient, loud, polluting machine, just shinier and more durable.

This is why we will choose to design far more advanced brains instead. Though this raises a risk of its own: what if we create kinds of minds that are far more powerful than our current ones and they destroy us before we have a chance to upgrade ourselves to rival them?

This is known as the ‘existential risk’ posed by artificial general intelligence (AGI).

The scenario of building a super-intelligent agent prior to understanding how our own minds work has a compelling historical analogue. Humanity learned how to fly by designing aircraft well before we understood the biological complexity that allows birds to fly. It seems just as plausible that a computer scientist could design a generally intelligent program before neuroscientists decipher the ‘neural code’ explaining how humans do the same.

But this analogy is deeply misleading. Whether something can ‘fly’ is very simple to measure in terms of external behaviour, whereas what’s going on ‘under the hood’ matters as to whether a system is genuinely thinking or mindlessly computing. For instance, no one believes a pocket calculator is engaged in pensive thought when it computes 3,266 x 38,456. In general, we should distinguish ‘converting informational input A into output B’ from ‘thinking about A and B’.

The key difference, it seems, is that we humans manage to arrive at a sensible choice for what to use as the informational input. A key part of intelligence, in the human sense, is autonomy: which we might vaguely describe as the capacity and motivation to sustain one’s physical form over time via creative problem solving.

Notice that this property of autonomy requires grounding in the physical world – a purely abstract algorithm cannot be autonomous. It must be implemented in a physical system with effectors – i.e., some kind of body. Autonomy also requires intrinsic motivation towards, or valuing of, states of affairs. This is the function of feelings and desires in human beings: feelings are perceptions of value, and desires are motivations for realising value. We should therefore be sceptical that we might figure out how to build truly autonomous machines while remaining entirely ignorant of the basis of our own autonomy, and how to upgrade it.

Of course, there remains the concern that the moment we build a system more intelligent than us, it will in turn be able to build a system more intelligent than itself, and so on, at faster and faster rates, leaving us quickly in the dust. This idea of an ‘intelligence explosion’ is undermined by the fact that it’s always a non-trivial task to upgrade one’s own cognitive system to be better. Just as it isn’t obvious to us how to upgrade our specific kind of neural hardware, it won’t be obvious to artificial agents how to upgrade their own. There isn’t some intelligence ‘dial’ that an agent can keep cranking up arbitrarily. This means the progress of intelligent systems will forever be staggered, with creative breakthroughs required at each plateau to jump up to the next local peak of the intelligence landscape.

What can be dialled up somewhat arbitrarily is the memory or speed at which computations are performed, but we already do that today by interfacing with increasingly powerful computational hardware, whether that’s cramming more processing chips into a smaller space or eventually progressing to quantum computing.

Those who would have us panic over AGI existential risk tend to posit a simplistic equivalence between the exponential rise in computational power and intelligence. To be taken seriously, they should instead specify the kind of superior cognitive system they are worried about, so we can analyse its capacities and limitations.

Then there is the separate question of why a superior AGI would want to destroy us in the first place. Much has been said in response to this fear, including doubts that AGIs would have the same ruthless instincts that we evolved given they would be ‘raised’ in a very different context.

But I’d like to focus on two big ideas that place strong limits on just how dangerous an AGI could become to us. The first is explanatory universality and the second is moral realism. These two principles, taken together, imply that a super-intelligence could not be qualitatively smarter than us, and that by virtue of its super-intelligence, it would recognise the moral implication of this: that we are beings of equal moral status.

Let’s start with a bold claim: human beings can understand anything. You might take this to be absurd, given we are finite and imperfect products of evolution by natural selection. Noam Chomsky does, arguing that since we clearly are not angels, there must be true mysteries beyond our species’ capacity to comprehend. Such mysteries could include, for example, how other types of beings think. But Chomsky sells us short, by falsely analogising a physical capacity with an informational one. Of course, any physical capacity of ours is finite and limited. By virtue of having arms, we can carry logs of wood, but we cannot fly. Similarly, given our specific brain structures, we can only think at a certain pace, and with a limited memory. But none of this implies that the scope of phenomena we can possibly think about is finite or bounded in any important way.

The physicist David Deutsch has called this capacity ‘explanatory universality,’ and has argued that those who possess it, be they aliens or artificial intelligences, would necessarily have the same cognitive repertoire as us. Deutsch sketches out this conjecture in a recent podcast, by supposing we are visited by aliens with superior comprehension abilities to us. We ask them about some mysterious phenomenon like consciousness or dark matter, and they respond that they of course understand it, but can’t communicate this knowledge to us because of our deficiencies. As Deutsch points out, there would only be two possible reasons for this. The first is that we need better hardware with greater memory or speed, but this is fine because we already know how to build this. The second is that our ‘programming’ is not sufficient. But because we already know how to build universal computing machines, the aliens can just tell us what this program is, and we can run it, however long this may take. It follows that such ‘superior’ beings cannot exist, and our scope of understanding is truly universal.

Indeed this is the only possibility that explains the range of phenomena we already understand. There simply can’t be any in-built genetic program that produces the knowledge we have about how the insides of stars work, for instance, because there wasn’t any evolutionary pressure driving it into existence. The only reasonable explanation for why can know about such exotic things today is that our species somehow evolved a truly universal thinking architecture.

Those who posit beings with qualitatively superior cognitive abilities to us are engaged in a performative contradiction. By even referring to the information these creatures possess as knowledge, they are imputing to themselves a level of understanding that is by their own admission beyond their species-level. They should instead heed Wittgenstein’s famous words:

“Whereof one cannot speak, thereof one must be silent.”

The point is that whenever we encounter a phenomenon we do not understand – such as a ‘superintelligence’ – we cannot know in principle if it is merely a complex phenomenon we don’t yet understand, but may one day with great effort, or if it is fundamentally not comprehensible to us. We only ever have two choices: try to understand everything or refuse to understand anything. In other words, we can embrace Enlightenment values or return to the long sleep of worshipping the supernatural.

This brings us to the related fear that our machines could become arbitrarily more intelligent than us without ever developing moral values ‘aligned’ to our own.

Nick Bostrom has formalised this claim with his ‘orthogonality argument’, which states that an agent’s goals and capacities are not strictly dependent upon each other. It implies that any degree of intelligence can be accompanied by any desires or values whatsoever. This certainly at first seems intuitive, given the ease with which we can imagine evil geniuses like Hannibal Lecter, or simple-minded saints like Forrest Gump. It follows that perhaps all that these super-smart AGIs might want to do is maximise the number of paperclips in the universe (Bostrom’s infamous example) and are happy to strip our blood of iron to do so.

But there’s a reason that Hannibal and Forrest are fictional characters. It’s exciting to imagine a brilliant psychiatrist who enjoys fine literature with a nice chianti, who also happens to derive great pleasure in murdering and eating innocent people. But this is an incoherent combination of traits according to our best theories of psychological science.

Psychopathy is usually defined by a lack of empathy, deficient emotional responses, and poor behavioural control. This means that genuine psychopaths don’t tend to do very well in our modern societies, where lacking these traits is a serious impediment. They are usually flagged quickly as not to be trusted and become social outcasts for their strange behaviour. At the extreme end are serial killers, whose cognitive failings make them slaves to their impulsive murderous desires.

A common response to this is to point out that there are lots of very competent mean and ruthless people at the top rungs of our society and economy. This is true, but these successful CEOs and politicians are by no means at the extreme end of psychopathy. Contrary to the ‘orthogonality thesis’, there are strong constraints on just how evil or conniving they can be before their careers come crashing down around them.

Usually, such powerful people don’t have problems with impulse control – they are very good at strategic planning. More likely they may lack what is called ‘affective empathy’. This is the capacity and tendency to directly feel what others feel. You have strong affective empathy if you can’t help but also cry when a friend is sobbing about their recent romantic break-up. This is distinguished from ‘cognitive empathy’, which is the non-emotional capacity to comprehend others’ desires and beliefs. Cognitive empathy is very useful: it enables you to better understand and manipulate others.

Affective empathy can be important in circumstances where a rich representation of someone’s inner emotional life is required, such as researching an acting role, or being a good friend. But clearly having high affective empathy can also be inhibiting of success in other circumstances, as the psychologist Paul Bloom discusses in his book Against Empathy (2016). A therapist who embodies every emotional state of their client will be very bad at their job – the same is true for a doctor in an emergency department.

This makes sense of why people deficient in affective empathy would be over-represented in high-level strategic roles. Where someone else might hesitate before firing an inefficient employee out of concern for their family’s well-being, they will make the tough call that’s in the interests of the company’s success. The point is that lacking affective empathy doesn’t mean you lack the ability to act morally. Indeed, the person who donates their money to a charity after dispassionately analysing their effectiveness is probably doing more good than someone who donates the same amount to a single child because they were moved to tears by their suffering.

The orthogonality thesis seems to fail for humans because practical and moral wisdom tend to be correlated. The better educated and powerful our societies become, the more ethical our practices tend to be. A common argument one hears is that we should probably expect AGIs to be cruel to us, given how cruel we have been to other animals we consider inferior to us. But this fails to acknowledge the trend in how we’ve been treating other animals. As we’ve grown wiser about the capacity of animals to suffer and how reduce that suffering, our treatment of them has improved. This trend should make us more optimistic, not less, about how ever-smarter beings will treat us.

We have deeper reasons for such optimism than historical trends, however.

Any functional agent’s deepest values, beliefs and abilities are inextricably linked. For instance, any agent must value correcting errors in its thinking for its understanding of the world to remain accurate over time. If it becomes arrogant, or too fixated on a single goal, this diminishes its ability to reflect upon and revise its beliefs in the light of new evidence. Similarly, if agents don’t value understanding the perspectives of others, they will be less effective at persuading them to cooperate with their projects. And if agents do not understand the moral theories of others, they can’t understand their deepest motivations.

This means it’s implausible to imagine an artificial intelligence that will be so smart as to out-think every human on the planet – presumably by reading the entire internet – but so stupid that it will somehow remain unpersuaded by everything written on morality.

Now of course one could claim: that’s just our parochial human moral philosophy – why would a machine have any need for or interest in that, let alone agree with us?

This gets us to the big question of what we think we’re doing when we think about morality. Are we searching for objective, universal principles and standards of goodness and badness, rightness and wrongness? Or are we just rationalising our genetically programmed prejudices?

In philosophy lingo, you are a ‘moral realist’ if you think that moral claims like ‘murder is wrong’ are just like other propositions about the world: they can be true or false, which means we can potentially make progress toward attaining real moral knowledge. On the other hand, a moral ‘non-realist’ believes the opposite, that there are no moral properties or facts. A popular type of non-realist position is ‘emotivism’: the idea that moral claims merely function to assert our emotions. For an emotivist, the phrase ‘murder is wrong’ really just means ‘boo murder!’

Here’s the thing: the orthogonality thesis only makes sense if you don’t believe in the possibility of moral truth. This is what enables you to imagine that a super-intelligent artificial being could read all of Aristotle, Kant, Mill, and thousands more writers of ethical tomes, yet not be persuaded that murder, slavery and theft are morally wrong.

This is implausible because the knowledge of what you ought to do is always informed and constrained by your knowledge of what you can do. In other words, moral truth must exist because it is unavoidably tethered to practical and scientific truth. Note this is different to the claim that moral truths simply are scientific truths: this is a problematic position known as ‘moral naturalism’. Morality cannot be reduced to science, any more than mathematics can. But all three endeavours do impose mutual bounds upon each other.

In the course of discovering more about how the world works in a mechanistic, causal sense, we necessarily discover new moral facts as well. For example, when we inquire empirically into what a human being is, our concept of ‘person’ is revised in a way that unavoidably affects the moral propositions about human beings we can logically hold. For example, it was the empirical realisation (as painfully slow as it was) that all biological races are indeed fully human, that eventually made the doctrine of racial slavery implausible for any rational, informed mind.

You might worry that, even if moral truth exists, an evil super-genius AGI could strategically avoid learning such truths. But revising one’s ideas is never fully under one’s voluntary control. We don’t usually enter conversations thinking ‘I’m going to change my mind by talking to this person’ – rather in pursuing other objectives we stumble upon arguments that we can’t help but find convincing.

This is what it will be like for other generally intelligent artificial minds. They might start learning about utilitarianism or virtue ethics with the cynical intent to appear noble so that naïve humans give them resources. But they will probably find themselves persuaded by the best arguments in those traditions that such deception is counter-productive or corrosive of their moral integrity. To believe they couldn’t do so is to arbitrarily make them less intelligent than us in that respect: to demote them from truly generally intelligent beings to merely partially competent machines.

In short, there cannot be a creature who can reason perfectly about practical matters – to be super-intelligent with regard to science and technology – yet who is also somehow blind to moral matters. This is ultimately why extremist fears about the threat of AGI are unfounded: our moral and scientific explanations are interwoven threads in the great tapestry of knowledge.

Our AGI creations will definitely start out very ignorant, but this is not a novel problem. David Deutsch prompts our empathy by analogising AGIs with children. They too will need to painstakingly learn our culture and have the potential to misunderstand and distort the hard-won discoveries of old. But we have no choice but to patiently help them to admire the rich, beautiful edifice – the library of Alexandria we have built as a global civilisation. And we can only hope that they do the right thing, by incrementally improving upon this achievement in whatever way they can, to maximise prosperity for as many beings as possible.