Thinking Through the Risks of AI

by Ali Minai

How intelligent is ChatGPT? That question has loomed large ever since OpenAI released the chatbot late in 2022. The simple answer to the question is, “No, it is not intelligent at all.” That is the answer that AI researchers, philosophers, linguists, and cognitive scientists have more or less reached a consensus on. Even ChatGPT itself will admit this if it is in the mood. However, it’s worth digging a little deeper into this issue – to look at the sense in which ChatGPT and other large language models (LLMs) are or are not intelligent, where they might lead, and what risks they might pose regardless of whether they are intelligent. In this article, I make two arguments. First, that, while LLMs like ChatGPT are not anywhere near achieving true intelligence, they represent significant progress towards it. And second, that, in spite of – or perhaps even because of – their lack of intelligence, LLMs pose very serious immediate and long-term risks. To understand these points, one must begin by considering what LLMs do, and how they do it.

Not Your Typical Autocomplete

As their name implies, LLMs focus on language. In particular, given a prompt – or context – an LLM tries to generate a sequence of sensible continuations. For example, given the context “It was the best of times; it was the”, the system might generate “worst” as the next word, and then, with the updated context “It was the best of times; it was the worst”, it might generate the next word, “of” and then “times”. However, it could, in principle, have generated some other plausible continuation, such as “It was the best of times; it was the beginning of spring in the valley” (though, in practice, it rarely does because it knows Dickens too well). This process of generating continuation words one by one and feeding them back to generate the next one is called autoregression, and today’s LLMs are autoregressive text generators (in fact, LLMs generate partial words called tokens which are then combined into words, but that need not concern us here.) To us – familiar with the nature and complexity of language – this seems to be an absurdly unnatural way to produce linguistic expression. After all, real human discourse is messy and complicated, with ambiguous references, nested clauses, varied syntax, double meanings, etc. No human would concede that they generate their utterances sequentially, one word at a time.

Rather, we feel that our complex expressions reflect the complex structure of the thoughts we are trying to express. And what of grammar and syntax – things that we learned implicitly when we learned the language we speak, perhaps building on a universal grammar innate in the human brain as the great linguist, Noam Chomsky, has proposed? And yet, ChatGPT, generating one word (or token) at a time, ends up producing text that is every bit as meaningful, complex, nuanced, syntactically deep and grammatically correct as human expression. This fact – not its hallucinations and logical challenges – is the most interesting thing about ChatGPT. It should amaze us, and we should ask what it might tell us about human language and thought.

Clearly, one thing it tells us is that, after training a very large adaptive system on an extremely large amount of real-world text data, all the complexities of syntax, grammar, inference, reference, etc., can be captured in a simple sequential autoregressive generation process. In a sense, the deep structure of language can convincingly be “flattened” into the simple process of generating word sequences. Two crucial attributes of LLMs make this possible: Extended context, and adaptive attention. Unlike the example given above, an LLM does not generate the next word by looking just a few preceding words; it looks at several thousand preceding words – at least once the pump is primed sufficiently. And, very importantly, it does not treat all these thousands of words as a simple “bag of words” or even a simple sequence of words; it learns to discern which ones to attend to in what degree at each point in the generative process, and use that to generate the continuation. This is adaptive attention.

The key thing the makes all this work is the depth of the system. As most readers will know by now, LLMs are neural networks – systems with tens of millions of relatively simple but nonlinear interconnected processors called neurons, inspired by the networks of neurons in the brain. The artificial neurons in an LLM network are arranged in layers, with the output from each layer moving sequentially to the next layer (or other higher layers), which is why these are called feed-forward networks (except for the final output being fed back into the system as input for the next word). The exact architecture of ChatGPT is not known publicly but it certainly has several hundred – perhaps more than a thousand – layers of neurons. Some layers operate in parallel with each other while others are connected serially. The number of pairwise connections between neurons is in excess of 175 billion in the original version based on GPT-3. The strengths of these connections determine what happens to information as it moves through the network, and, therefore, what output is produced for a given input. The network is trained by adjusting these connection strengths, or weights, until the system produces correct responses on its training data. While layers can perform various types of computations, such as merging the signals from several prior layers, multiplying them together, or transforming them in nonlinear ways, there are at least 96 places in the network where adaptive attention is applied to all or some of the data moving through the network. Thus, in determining the next word to generate, ChatGPT takes the initial context input through many, many stages of analysis – implicitly inferring its syntactic and semantic organization, detecting dependencies, assigning references, etc. It is this extensively dissected, modulated, squeezed, recombined and analyzed version of the input that is used finally to generate the output. The output layer produces a probability distribution for the next word over the whole vocabulary, from which the actual output word is sampled. That is why ChatGPT gives a fresh answer to the same question each time.

LLM naysayers have derisively described them as “autocomplete on steroids”. While the autoregressive generation process has the same form as traditional autocomplete, the description above shows that this criticism pales before the sheer computational sophistication of the system.

ChatGPT and Intelligence

Having described the workings of ChatGPT in a simplified way, we can return to the question of its intelligence. Is it intelligent? Rather than giving a flat “no!” as the answer, it’s worthwhile to move beyond the simplistic, frenzied opinionating on ChatGPT and consider the issue with nuance.

In the computer science community, intelligence has been seen as a computational capacity. In the first several decades of AI, the focus was on building computational systems for manipulating symbols. Though that stage has now passed and most AI systems use brain-inspired neural networks, the dominant view among AI practitioners is still one centered on computation. This is natural and useful, since physical processes can be seen as computations, but for all its theoretical power, this framework is inadequate for understanding what is, at its root, a biological phenomenon.

As I – and many others – have discussed elsewhere, intelligence is an attribute of embodied organisms – animals with a central nervous system – that allows them to function productively in the real world. It is grounded fundamentally in the animal’s integrated experience of the world through its senses, all in the context of its memories, motivations, and emotions. It’s learning from this experience that allows the animal to create internal representations and response patterns that reflect the causal structure of the world, and enable it to produce situationally appropriate behaviors. The animal is aided immensely in this by its well-adapted design as a result of millions of years of evolution – making it learning-ready at birth. It then has the opportunity to learn more and more complex relationships and map them to more complex behaviors over a period of development that, in humans, lasts two decades or more. The result in the end is an organism capable of a vast array of useful, flexible – even creative – behaviors that promote survival, and thus, evolutionary fitness. This is the natural – real – version of general intelligence. It is general because every aspect of its deployment emerges from the same source – the animal’s brain and body – and is grounded in the unified, always-integrated experience of the animal.

In contrast to all this, ChatGPT (or any other LLM) has no sensory input, no behavioral capacity, and does not experience reality directly through embodied sensorimotor experience of the real world. Rather, it sees the world through data – mainly text (and computer code). It infers logical relationships, correspondences, and causality only to the extent that these are present in the statistics of text. It is, therefore, like a prisoner in Plato’s Cave, or like Tennyson’s Lady of Shallott – seeing reality only in shadows on a wall or as reflections in the distorting, flattening mirror of language. In the end, what it learns is grounded not in the real world, but in text. It generates convincing text, but as simulation, not as a result of any real experience. In the cave analogy, given a fleeting shadow, it is able to produce an entire convincing shadow play on the wall, but with shadows unconnected to any actual objects – indeed, without any notion even of the existence of a world containing such objects. In addition to a lack of sensory or motor capacity, the system also has no explicit long-term memory, no internal motivations, and no autonomy. Its working memory is just its token buffer. Thus, in real terms, ChatGPT is not very intelligent at all. However, in the world defined only by text, it is, in fact, quite intelligent, and that is most interesting.

In addition to the “glorified autocorrect” critique, some have seen LLMs as “stochastic parrots” – systems that regurgitate what they have learned by a simple matching of patterns, with a certain degree of random variation. This too is rather misleading. The system has no memory where it can explicitly store any information for regurgitation. Everything it learns is implicit in the strengths of the 175 billion connection weights that define the fine-grained architecture of the system. These values have been learned through training on vast amounts of data, and have distilled the statistics of deep semantic and syntactic relationships in that data. No particular connection or neuron stores any particular fact for future recall – a feature LLMs share with real brains. When a context input (or query) is presented to the system, it diffuses sequentially through hundreds of layers of neurons arranged in parallel and serial relationships, exciting in each layer a pattern of activity that implicitly represents useful information extracted from the incoming data. Were we able to look at these patterns of activity, we may well identify some as recalled memories, some as tentative inferences, and others as internal hypotheses generated by the network. This is very similar to what happens in the brain as it processes incoming sensory data. There too, patterns of activity emerge in layer after layer of neurons in many regions across the brain, and the interactions between these patterns ultimately give rise to patterns of activity in other layers that generate responses such as a feeling, a recalled memory, a thought, an utterance, or an action. Consciousness – the feeling of being – may just be our perception of the dance of these patterns in our brains. In this sense, then, ChatGPT is doing something brain-like, even though it’s doing it with text data, not real experience. And its success in generating convincing text output should inform us that what we consider deep, complex relationships in language can, in fact, emerge in the course of successive numerical transformations across layers of neurons in a system far, far simpler than the human brain. This, in turn, should raise questions about how the brain itself might be generating linguistic output and, in particular, about whether syntax and grammar really are fundamental prior aspects of language or emergent ones.

But there are also crucial differences between the somewhat brain-like processing of ChatGPT and the real thing. The lack of sensory inputs, behavioral outputs, and a long-term memory have already been mentioned. Critically, the layers of neurons in the brain don’t just feed forward. Instead, they communicate bidirectionally, setting up a much more complex, often competitive, sometimes resonant, dynamic process of activity than that produced by the passive diffusion of information in one direction in ChatGPT. Both the brain and ChatGPT are modular, but the modules of ChatGPT – called transformer blocks – are all very similar and generic, specialize to tasks only through learning and as a result of their position in the layer hierarchy, and use only a few types of artificial neurons. The overall architecture of ChatGPT – as far as is known – is also quite simple and generic. In contrast, the brain has an extremely complex architecture, with hundreds of different types of neurons organized into deeply hierarchical, diverse, multifunctional modules. It’s an architecture optimized by evolution, development, and learning to function in a real world that the animal can perceive and act in. The only thing chatGPT “senses” are word tokens, and the only “behavior” it produces is the generation of one token at a time. And, to cap it all, ChatGPT requires extremely long, carefully supervised training with a huge amount of data, whereas animals learn rapidly, autonomously, and from limited data. At best, then, LLMs represent a very limited and rudimentary form of intelligence – if any at all. But that does not make them less profound or less risky.

The Risks of LLMs

On March 28, 2023, a large number of prominent people in the technology area, including Elon Musk, Yoshua Bengio, Gary Marcus, and others, signed an open letter sponsored by the Future of Life Foundation (FLF), demanding that “all AI labs” pause work on LLMs for 6 months at the GPT-4 level because of the hazards the technology poses (for the record, I did not sign the letter). A more eloquent – and more alarming – op-ed in the New York Times by Yuval Noah Harari, Tristan Harris, and Aza Raskin warned of the profound dangers that AI models pose for human society. These are just the two most prominent among the many, many warnings from all quarters. Critics of these warnings have termed them “AI alarmism”, and made some well-argued objections – especially to the letter. Unfortunately, this has tended to conflate very distinct issues that are not all equally valid.

Let us take the core point made by the FLF letter:

Contemporary AI systems are now becoming human-competitive at general tasks, and we must ask ourselves: Should we let machines flood our information channels with propaganda and untruthShould we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization? Such decisions must not be delegated to unelected tech leaders. Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable.

The first sentence of this passage should immediately give us pause. Knowing what LLMs and other AI systems like DALL-E 2, Stable Diffusion, etc., do, are they really becoming “human-competitive at general tasks”? Not at all. There are some specific human tasks, such as computer programming, copy writing, drafting documents, creating interesting images, etc., where AI systems are indeed becoming human-competitive in a limited way, but that tells us more about the nature of these tasks than the dangers of AI. For example, computer programming is a task where a highly constrained piece of “text”, i.e., code, is created under very strict syntactic and logical rules. Thus, if the prompt is correct, ChatGPT – trained on millions of existing programs – can generate a reasonably good response, though often with errors. Similarly, letters, resumes, syllabi, sightseeing itineraries, even political speeches, are very formalized things where a well-trained system can excel. On the creative side, the fact that LLMs and graphics generators are not constrained by facts can become an asset, allowing the unbridled generation of stories and images. But humans do many, many more things. And the same individual human can do most of those things, like drive a car safely in city traffic, solve crossword puzzles, buy appropriate groceries, and manage hundreds of social relations. No AI system is anywhere near doing that, or even approaching the versatile, integrated intelligence of a pigeon or rat. Thus, the alarm over the danger of developing “nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us is rather far-fetched at this point, though it must be taken more seriously in the long term, as I have discussed in a previous article, and return to in the next section. First, it’s important to look at some more immediate issues.

Short-Term Challenges

Current AI technology, in fact, poses several challenges in the short and medium terms, but the present discussion will look only at three of the most important.

Economic Disruption:

First, there is the danger that, as the FLF letter says, AI might “automate away all the jobs, including the fulfilling ones”? While that will no doubt occur – and has already been occurring – this is nothing new. This argument has been made against every major technological innovation in modern times, such as in the Luddite movement against the industrialization of the textile industry in 19th century England. Steam power, electricity, cars, radio, computers, the Internet, have all caused untold numbers of jobs to disappear – including fulfilling ones. But the lesson of this history is that innovations such as these also create whole new classes of jobs that can be just as fulfilling – jobs that no one could possibly have foreseen before the technologies. Who, for example, foresaw in 1850 that millions of people all over the world would earn a good living driving large trucks across continents, or in 1950 that millions would do the same making videos from the comfort of their homes? At best, automation can be planned for, not stopped. The reason that the alarm is more organized this time is that AI-based automation threatens to hit higher-paying, white-collar jobs such as lawyers, physicians, and professors. A more interesting possibility – likely to take some years but probably in the near-term – is the automation of “thinking” jobs such as engineers and scientists. An AI system trained sufficiently on the scientific literature may well be able to propound new theories, design complicated structures, and make technological breakthroughs. That is already the case in many narrow cases, such as inferring the structure of proteins, though not in a sufficiently end-to-end way to replace scientists and engineers. If that day arrives – and especially if AI machines become capable not only of designing but building new, more advanced AI machines – well, then we will be in an entirely different reality. That might be something worth preventing to avoid humanity being taken over by machines, but stopping LLMs now is too little, too early for that.

The Age of Infinite Misinformation”:

The most important issue that the FLF letter raises – and that is also central to the op-ed by Harari et al. – truly merits immediate alarm. The letter asks: “Should we let machines flood our information channels with propaganda and untruth?” More eloquently, Harari et al ask:

Language is the operating system of human culture. From language emerges myth and law, gods and money, art and science, friendships and nations and computer code. A.I.’s new mastery of language means it can now hack and manipulate the operating system of civilization. …. What would it mean for humans to live in a world where a large percentage of stories, melodies, images, laws, policies and tools are shaped by nonhuman intelligence, which knows how to exploit with superhuman efficiency the weaknesses, biases and addictions of the human mind — while knowing how to form intimate relationships with human beings?

This fear might not move us if we had not already seen what social media has done to the world – how it has divided people, eroded social trust, promoted dangerous lies, and given the most toxic forces access to every mind on the planet. To be sure, social media itself did not do this; it just provided humanity the opportunity to express the worst of itself on a global platform. Social media became a catalyst for what had been slow-moving, local, disconnected eruptions of toxicity, turning them into global waves that have swept through entire societies. The result has been the rise of autocrats, spread of dangerous anti-science attitudes such as vaccine hesitancy and climate change denial, and a worldwide flood of racism, misogyny, and bigotry that shows no signs of abating. To this witches’ brew will now be added an infinite capacity to generate convincing misinformation, high-quality propaganda, fake images and videos, etc., that will, in a short time, so pollute the knowledge base of humanity that all the things we consider reliable sources of fact will lose their credibility. [See article by Gary Marcus linked in the section heading.] And, at the same time, the ever-increasing intrusion of AI into our lives – in what we see and read, how we travel, even our social relationships – will make us more and more dependent on the very AI that is erasing the boundary between fact and fiction. Add a touch of virtual reality or augmented reality, and it’s easy to see how a dystopia might emerge long before humans can build super-powerful AI capable of seizing control of the world.

Some have dismissed these fears because today’s AI systems are not really intelligent, but that has nothing to do with the danger. We humans will supply the intelligence; AI will only play with it. A good, if still imperfect, analogy here is with viruses. They are not alive, but, in their non-living form, they have the ability to hack into the machinery of life, and that makes them deadly. In the same way, the “stupid” AI of today – LLMs, image generators, code generators, etc. – may not be truly intelligent, but they can hack into our minds and turn our real intelligence against us. The societal impacts of this will be immense and unpredictable. Whether or not it leads to “loss of control of our civilization,” it will surely change it profoundly.

Imbalance of Power:

A third major short-term issue that is not mentioned in the FLF letter or the NYT op-ed is the risk to the societal and global balance of power. As the rapid – and surely very premature – adoption of ChatGPT into all sorts of applications shows, it is an immensely powerful tool. Systems that follow it will be even more powerful. Very soon, some of these systems will become indispensable for ordinary living, much as the Internet, search engines, GPS, cell phones, etc., are today. Indeed, AI already pervades these applications, and will do so even more. That will give immense, disproportionate power to a few large corporations that control the technology. It can be argued, with some justification, that this is not a new problem. Large corporations from AT&T and IBM to Microsoft, Amazon and Google have, at times, established virtual monopolies on critical technologies. However, the power of AI is likely to be so disproportionate and so pervasive as to make any corporate dominance qualitatively different than anything seen until now. And the same extreme asymmetry will emerge at the geopolitical level, with a few countries that dominate in developing cutting-edge AI technologies acquiring even greater control over the fate of the rest of the world. One can argue that nuclear weapons already do this, but nuclear weapons do not run the lives of billions of people. Ai will, and the influence of those who control it will be felt by every individual on the planet. With this asymmetry of power – whether corporate or geostrategic – will come even greater economic inequality. Things could get to the point where great technological powers will – and to some degree already do – compete to create their technological spheres of influence, much as occurred during the Cold War. One hopes that this time, the world at large will step in sooner to avoid that.

LLMs and Existential Hazard

Now let us turn to the issue of whether todays AI – and particularly LLMs – are pertinent to the existential threat posed by AI. While the distance from ChatGPT to something like Skynet is unimaginably vast, it would be incorrect to say that there is no relationship. As discussed earlier in this article, LLMs do represent an extremely simplistic and rudimentary type of brain-like architecture and functionality. As long as they are trained by humans on human-supplied data under human supervision, they will remain the sophisticated but limited tools we see today. The danger starts to rise if they acquire more interesting features such as those discussed below.


Today’s most celebrated AI systems such as ChatGPT and Stable Diffusion are relatively “safe” because they are not connected directly in the physical world. They only sense the very limited types pf data we give them – text, images, video, etc. – and produce similarly limited things that humans can choose to accept or reject. Thus, all they can mess with is our minds. However, as highly sophisticated AI begins to be embedded into embodied systems – robots and intelligent control systems – that sense the world directly and act on it physically, all the quirks that we see in today’s AI systems – the hallucinations, the illogical assertions, the occasionally combative arguments – will begin to have far more serious real-world consequences. In practical terms, this is the thing most likely to draw real regulation by governments. We see this already with self-driving cars, which are, in fact, embodied systems with embedded AI. Not only are they strictly regulated, they are very unlikely to be adopted widely in the foreseeable future for the simple reason that, for all their intelligence, they cannot coexist harmoniously with human drivers except in very constrained situations.

Autonomy of Purpose:

Currently, AI systems are better seen as smart tools for humans than as active intelligent systems. They await human instructions and follow them. Their intelligence lies entirely in the complexity of this behavior. But it is quite possible that technology will move in the direction of giving AI systems more autonomy. One can define several levels of autonomy, but human control will necessarily diminish as AI systems move from the current autonomy of response towards autonomy of behavior, planning and purpose. Eventually, humans may have no control at all over them or – more importantly – over the values that drive their actions.


Almost all AI systems today learn when and what their human designers or users want. This is in contrast to animals, who do a lot of their learning in response to their internal drives such as hunger, thirst, mating, pleasure, dominance, etc. It is largely the fulfillment – or otherwise – of these drives that teaches the animal how to act in the real world. The closest that today’s AI approaches this is in reinforcement learning, where AI systems generate responses autonomously and are given positive or negative reinforcement. The process is extremely slow and cumbersome, but methods are improving. An embodied, autonomous system with efficient reinforcement learning could quickly learn things far beyond what its makers might have intended.

Self-Enhancement and Engineering:

The architectures and training protocols of current AI systems are entirely under human control, but it is quite conceivable that more self-motivated and autonomous systems in the future would have greater control over their own capabilities – seeking out new data to learn from, modifying their own structure, and even adding hardware enhancements. While the last possibility is likely far in the future, self-enhancement at the software level is probably just around the corner. ChatGPT and systems like it can already write fairly sophisticated software. These capabilities will improve rapidly to the point that these systems will be able to write code for AI systems as capable as themselves or even more so. As an experiment, I asked ChatGPT to write Matlab code for a transformer block, which is the main neural network module used in the construction of ChatGPT itself. It produced the correct code without difficulty. In its current form, it does this only when asked, but a more autonomous version of ChatGPT equipped with self-motivation and capable of efficient reinforcement learning could quickly start enhancing itself or building other AI systems (and much else) if given access to data and computing resources. At that point, it would have become a self-evolving intelligence with open-ended possibilities. And, unlike humans, this AI will be able to evolve much faster and through its own intelligent design rather than the slow, messy process of biological evolution.

It is worth emphasizing that none of these capabilities in themselves imply the emergence of superior artificial intelligence. We already have embodied robots that deliver food and vacuum floors. Self-driving cars already have significant autonomy – though not of purpose. Programs such as AlphaGoZero already do reinforcement learning in a self-motivated way to enhance their skills autonomously. It’s only when all these things are combined in systems of sufficient complexity that something like true intelligence can emerge, at least in principle. But what is that complexity? Contrary to what many are asserting, LLMs give us an important clue. They tell us that even their very simple architecture is capable of inferring relationships in data that we might have considered too complex to be inferred in this way. In this, LLMs are showing us the power of emergence – how unexpected high-level phenomena can arise in complex adaptive systems. ChatGPT was not trained explicitly on meaning, grammar or syntax, and yet it inferred them implicitly, giving it the emergent ability to generate meaningful, grammatically correct, and syntactically consistent text. We cannot possibly predict what greater capabilities might emerge in neural networks much larger and more complex than today’s LLMs. In a recent paper, I have argued that such networks will need to hew closer to the principles, architectures, and mechanisms embodied in the brain, but ChatGPT shows us that even a very minimally brain-like system can achieve astounding capabilities. This should not be taken lightly. The existential hazard of AI may not be close, but it is very real in the long term.

Solutions – Real and Imaginary

Will stopping research on LLMs for 6 months address any of the concerns described above? Of course not. First, there is no world body that can enforce such a moratorium. And what could possibly be done in 6 months to address problems so diverse, so immense, and so interwoven with our lives? One might give the example of chemical, biological, and nuclear weapons, which are highly regulated internationally. But it took years and decades to negotiate those regulations, and they are still violated every so often even though the technologies underlying these weapons are relatively amenable to monitoring. Building large AI models has a much lower barrier to entry. All that is needed is powerful computers and smart engineers. The same computers and engineers are necessary for doing the beneficial things that run the world today, and thus cannot be proscribed like nuclear or chemical weapons. Add to that the fact that, for all its dangers, the new AI also promises immense benefits – in medicine, in education, in entertainment, in improving the quality of life for individuals everywhere. But, while we speculate about the long-term dangers, it is critical that we think immediately about how to shape today’s unintelligent AI in a way that mitigates its harms while enhancing its benefits.

The problem has to be addressed at least at three level: Regulation, engineering, and education. Regulation would control permissions, standards, rights, responsibilities, and accountability; engineering must focus on building systems that inherently pose smaller risks due to their design; and education must make human society more resilient to the potential harms of AI. None of these is likely to work perfectly, but their combination is the only way forward.


There are four obvious points of control for regulation of AI systems:

Data regulation would specify what data can be used for training the system, who owns it, who can use it, and what rules govern such use. This would be somewhat similar to current copyright and data privacy regulation.

Output regulation could potentially require that the output meet certain criteria of harm minimization, but this is likely to be futile because of the adaptive and emergent nature of AI systems. Such regulation would also run into legal challenges on free speech grounds in open societies.

Licensing regulation could require all AI systems in the market to meet certain well-defined criteria before being made available for use, much as drugs and foods are regulated. Here the challenge is again that unlike drugs or foods, these systems are adaptive and, in principle, could learn continuously, so the system today is not what it was yesterday. However, this could be addressed by licensing specific versions that can no longer change, though this may well limit the utility of some systems for end users. That, in turn, can be addressed by transferring rights at the point of purchase, so the company is only responsible for the system purchased by a user and not for changes that the user might make by further training. The companies could, in principle, limit how much any user can change a particular version. Self-driving cars are likely to provide a good test case for all this.

Access regulation would specify who can have access to what kind of system, along the lines of regulating access to cigarettes, alcohol, adult material, etc. Given that AI will live in cyberspace, this is likely to be no more successful that any attempt to regulate porn, but may work in some situations, e.g., in schools. Access could also be managed by creating things akin to parental filters that would detect and limit inappropriate exposure to AI. Of course, policymakers would need to decide what constitutes inappropriate exposure.

One kind of regulation that should not be considered is regulation of the system itself: Any attempt by governments to regulate things like the size, complexity, architectures, learning protocols, etc., would only kill the entire enterprise of AI with all its promise of great benefits for humanity.

Engineering Solutions:

Mitigating the potential hazards of AI through engineering has been the primary focus of practitioners concerned with the issue. Three main approaches have been proposed: Alignment; explainability; and transparency. Unfortunately, all of them face serious logical contradictions due to the nature of the AI technology, though this point seems to be lost on too many in the AI field. But let us look first look these three non-solutions and why they are inadequate.

AI Alignment is the idea that the values of an AI system should be aligned with human values, or, in a weaker version, that the behavior of an AI system must be aligned to its designer’s (or user’s) goal. The first definition raises two fundamental problems. The first is that it’s not clear how one can get a continuously adapting system with emergent behaviors to hew to a specific set of values. The designers of ChatGPT have, in fact, tried to do some amount of alignment by using human feedback-based reinforcement learning (RLHF) to “socialize” the system after it has learned on the raw data. Even so, it has been necessary to use post facto filters to remove offensive content. When the version of GPT-4 used in Bing had these guardrails removed, it descended quickly into anti-social behavior. Another idea to align systems is to use objective function – the function whose value the learning system is trying to maximize – to incorporate desired values so they are learned as part of the overall learning. This is implicit in the second definition given above. However, even if we knew how to code complex human values into numerical functions, it is impossible to know a priori how to encode all the desired values and only the desired values into it. The second problem with values alignment is the question of “Whose values?” Human values vary across cultures and over time. Things that were acceptable in the US 50 years ago are no longer acceptable. Things are acceptable today in France are not acceptable in Saudi Arabia or China. So, even if a magic algorithm were found to align AI to a set of values, which set of values would it align with? And if each culture or sub-culture will train its AI systems on its own values, would we not just be extending human feuds to machines? On the other hand, if AI will align with Western values because it is being developed in the West, will its imposition on the rest of the world not be seen as a kind of neo-colonialism? Finally, alignment must also face the problem of emergence: What we want an AI system to learn and what it ends up learning will always be unpredictable, and this will be true of the values as much as of the behaviors that arise from them.

Explainable AI (XAI) – and its cousin, interpretable AI – is an attempt to take on the complexity of modern neural network-based AI systems. Such systems are termed opaque or black-box systems, because their inner workings are very hard to interpret. XAI’s goal is to alleviate this problem, and make it possible to explain why the system produces a particular output in response to a given input. To this end, some methods have been developed, and they work well in certain types of systems. However, application-scale, highly nonlinear systems like ChatGPT and its peers are just too complex to be dealt with this way. To be sure, as deterministic systems, it is possible in principle to explain why each neuron in them produced the signal that it did; it’s making sense of these millions of micro-explanations that is hard. What’s more, it’s also futile because the answer produced by the system is no more understandable by looking at the response of individual neurons than it’s possible to understand the image of a face by looking at individual pixels. Emergence, again, is the problem. And it will only get worse as the systems become more complex. Unfortunately, lack of explainability is a major impediment to the use of AI in applications such as medicine and air traffic control. One option for achieving some explainability in AI systems at scale is to have the system learn introspection – the ability to explain itself – as it learns how to do its main task. After all, that is how humans learn to explain their actions. However, there’s no obvious way to do it in the current systems that learn through supervised learning. Another caveat is that humans are often wrong about their own actions, and that will be the case in AI systems too. And there will be a large computational overheat to boot. Another possibility is to build systems that are inherently more explainable, but this is likely to impose too much constraint on the capability and scalability of systems.

Transparent AI is often confounded with XAI, but it’s useful to see it as its own approach. Here, the goal is to make all the details about the workings of an AI system – its architecture, its hyperparameters, training and testing protocols, training and testing data, its objective function, etc. – publicly known so that an informed user knows what they are dealing with and can thereby explain the system’s behavior and biases. While such transparency may be beneficial – though not always in corporate interest – it is likely to provide only incremental benefit in terms of predicting risks. To be sure, analysis of data for biases or understanding decisions through the knowledge of the objective function are possible and useful, but that is far from complete transparency.

To understand why so many in AI are still focused on these impractical solutions, consider the following vignette. On April 2, 2023, AI pioneer Yann Le Cun tweeted:

Some folks say “I’m scared of AGI”. Are they scared of flying? No! Not because airplanes can’t crash. But because engineers have made airliners very safe. Why would AI be any different? Why should AI engineers be more scared of AI than aircraft engineers were scared of flying?

In fact, there is a very simple answer to that: AI systems are adaptive systems with emergent behaviors, and are thus inherently unpredictable, and this is a feature, not a bug. Planes, in contrast, are carefully designed not to change over time and to operate exactly as designed without showing any emergent behaviors. This fundamental difference between classic engineered systems and complex adaptive systems is a critical fact that most practitioners in engineering and computer science just do not grasp. The classic engineering method focuses on building highly optimized systems with known, predictable, controllable, and reliable behavior. They are designed carefully to precise specifications, tested in prototype, and then duplicated in a factory to guarantee the same performance as the optimized prototype: No adaptation, no self-organization, no surprising emergent behaviors. In contrast, complex adaptive systems are meant to change their behavior continuously and to respond to novel situations in unexpectedly creative ways. They are not designed to be optimal, transparent, predictable, reliable, or controllable – and “safe”. Rather, they are meant to be adaptable, versatile, flexible, resilient, creative, and evolvable – always learning surprising new behaviors to become better adapted to their complex, ever changing environments. In many applications, one does not need such systems. There, a pre-trained, reliability-tested, unintelligent machine learning system would be sufficient – but that is not AI.

So what then can be done to engineer AI systems that are less risky? Several things, in fact, including the following, though none of them is truly sufficient.

Data Filtering: Develop sophisticated methods to filter the training data for the system to minimize the risk of learning dangerous behaviors. Unfortunately, this can only be done in limited cases, but it can work somewhat like we take care that children are not exposed to certain material while they are developing their values. In embodied systems with real-time learning, this filtering may come down to choosing the right sensors and building specific filters into the system a priori. It won’t solve the problem, but can mitigate it.

Isolation: Always keep it possible to isolate the system from data and computational resources so that it only learns and grows under human supervision, even though doing this would preclude completely autonomous AI; perhaps that’s what is needed. In some embodied systems, it may be possible to have circuit breakers than can be triggered by human voice commands. One of the goals of this isolation would be to not give an AI system the ability to enhance itself or to build other systems, or to have access to any systems critical to human life. Of course, this will also reduce the power of AI systems for human applications, but it would be a price worth paying.

Energy Limitation: Make sure that the energy source for the AI system remains under human control, so no self-recharging systems. By doing this, one can ensure that the system remains dependent on humans and has only a finite horizon for rogue behavior.

Developmental Socialization: This process would have the system learn in stages from increasingly complex data, much as human or animal children do, and to learn in continuous interaction with humans (or possibly with other “senior” AI systems). Thus, good behavioral values could be learned in a way that is always integrated into the rest of learning, instead of being added on after the system has finished its task learning – as was done in ChatGPT. An important point here is that this would require reinforcement learning rather than supervised learning, thus removing the need for crafting the right objective function.

Continuous Monitoring and Intervention: Monitor AI systems carefully for signs of risky behaviors and intervene immediately when they are observed – before they become entrenched. Again, we are limited by an inability to observe what the system is learning. There could be a great time lag between when a pathological behavior is learned and when it becomes visible.

Other similar limitations can be considered, all aimed at preventing AI systems from reaching the threshold of fully autonomous intelligence. In the end, however, all these are only mitigations, not solutions. They will work until they no longer do. No one can fully control or predict what a complex adaptive system will learn. If we wish to build sophisticated AI, we have to reconcile with this fact, just as we have accepted the presence of humans who do not agree with our values and show pathological behaviors. Building a general AI is creating a new species – actually, a new form of life based in metal and silicon rather than carbon. Our ability to know where this will lead is necessarily limited because such a thing has never existed on earth.


The final – perhaps the most critical – way to address the problems raised by AI will be to educate people to live in a world with powerful AI. Even without explicit education, humans may develop a modus vivendi as these systems get more powerful, but the experience with social media is sobering. The human mind, human society, and human institutions are all eminently prone to being hacked by a technology that can manipulate information even before it becomes embodied and can cause physical harm. Living with AI – and soon with virtual and augmented reality – will require a revolution in the way people relate to fact, information, technology, and each other. Humanity has not even begun to develop the necessary methods and strategies for this. It will be an exciting journey through interesting times.