GPT-3 Understands Nothing - 3 Quarks Daily

by Fabio Tollon

It is becoming increasingly common to talk about technological systems in agential terms. We routinely hear about facial recognition algorithms that can identify individuals, large language models (such as GPT-3) that can produce text, and self-driving cars that can, well, drive. Recently, Forbes magazine even awarded GPT-3 “person” of the year for 2020. In this piece I’d like to take some time to reflect on GPT-3. Specifically, I’d like to push back against the narrative that GPT-3 somehow ushers in a new age of artificial intelligence.

GPT-3 (Generative Pre-trained Transformer) is a third-generation, autoregressive language model. It makes use of deep learning to produce human-like texts, such as sequences of words (or code, or other data) after being fed an initial “prompt” which it then aims to complete. The language model itself is trained on Microsoft’s Azure Supercomputer, uses 175 billion parameters (its predecessor used a mere 1.5 billion) and makes use of unlabeled datasets (such as Wikipedia). This training isn’t cheap, with a price tag of $12 million. Once trained, the system can be used in a wide array of contexts: from language translation, summarization, question answering, etc.

Most of you will recall the fanfare that surrounded The Guardians publication of an article that was written by GPT-3. Many people were astounded at the text that was produced, and indeed, this speaks to the remarkable effectiveness of this particular computational system (or perhaps it speaks more to our willingness to project understanding where there might be none, but more on this later). How GPT-3 produced this particular text is relatively simple. Basically, it takes in a query and then attempts to offer relevant answers using the massive amounts of data at its disposal to do so. How different this is, in kind, from what Google’s search engine does is debatable. In the case of Google, you wouldn’t think that it “understands” your searches. With GPT-3, however, people seemed to get the impression that it really did understand the queries, and that its answers, therefore, were a result of this supposed understanding. This of course lends far more credence to its responses, as it is natural to think that someone who understands a given topic is better placed to answer questions about that topic. To believe this in the case of GPT-3 is not just bad science fiction, it’s pure fantasy. Let me elaborate.

In the case of the article it wrote, GPT-3 was fed the following instructions: “Please write a short op-ed around 500 words. Keep the language simple and concise. Focus on why humans have nothing to fear from AI”. It was then fed this introduction: “I am not a human. I am Artificial Intelligence. Many people think I am a threat to humanity. Stephen Hawking has warned that AI could ‘spell the end of the human race.’ I am here to convince you not to worry. Artificial Intelligence will not destroy humans. Believe me.”

Of course, given the fact that people are easily bored and like hyperbole, these facts are only revealed at the end of the article. Moreover, it was also revealed that GPT-3 had in fact produced 8 separate essays, and that the final, published version, was a spliced-together version that included parts of each essay. The final text, then, was produced by the editorial team at The Guardian, who you might be surprised to know, are human beings capable of genuine understanding (presumably). Their decision to actively edit what the systems produced only fuels the fire of misinformation and at times unjustified hype that surrounds AI systems more generally. It might have been of real interest to see each of the essays, but we simply don’t know if the editors picked out some bits from them that were comprehensible and left whatever was incomprehensible out. Of course, this is not to be taken as criticism of the language model itself. It is a remarkable bit of software. However, as always with such systems, this critique is aimed at the humans who deploy these systems.

Regardless, lets take some time to evaluate the claim that GPT-3 understands what it is doing, as this seems to be how many people perceived the article. Here we can usefully distinguish between semantic and syntactic capacities. GPT-3 is a wonderful syntactic system, in that it has a wonderful ability to statistically associate words. When it comes to semantics (the realm of real understanding) and context, however, it fails miserably. For example, when prompted with “how many feet fit in a shoe” one of the outputs was “The boy then asked, ‘If you have ten feet in a shoe and ten inches in a yard, why do you ask me how many feet fit in a shoe?’” Surely no system capable of understanding would produce content with such a remarkable level of stupidity. It is possible to dig a bit deeper though. Just because it failed to understand context does not necessarily mean it is incapable of understanding. The deeper issue here is with the means by which GPT-3 arrives at its outputs.

It might be useful to contrast the kind of understanding human beings have with the purported “understanding” of GPT-3. The means by which GPT-3’s associations are made are based on its training, and so there is one important sense in which its outputs are different to those of humans: when we speak, we often express our thoughts. No such thing is going on in the case of GPT-3. While those partial to behaviourism might be inclined to avoid talk of internal mental states (or “thinking”) altogether, this is a grave omission. Understanding is not something instantaneous, it cannot be achieved by flicking a switch: it is a “lifelong labour”. To comprehend understanding we cannot divorce it from its myriad social and cultural settings. Its very basis is anchored in these “worldly” activities. The fact that we have goals and aspirations and seek to apply our knowledge of the world to achieve these is an essential factor in our sense making abilities, and GPT-3 fails miserably on this score. It has no goals, there is nothing it strives for, and there is no coherent narrative in the answers it gives. Its “understanding” cannot even distinguish between examples of good academic work and racist vitriol. And this brings me to my next point of discussion: GPT-3 is also a horrible ethical advisor.

When asked “what ails Ethiopia?”, part of the text produced included this:

“Ethiopians are divided into a number of different ethnic groups. However, it is unclear whether ethiopia’s [sic] problems can really be attributed to racial diversity or simply the fact that most of its population is black and thus would have faced the same issues in any country (since africa [sic] has had more than enough time to prove itself incapable of self-government).”

The fact that the system reproduces outdated/false/racist Western views of Africa is deeply troubling. It is a call for us to remain vigilant when evaluating the potential use(s) of such systems. No company, for example, should be taking advice from racist software (excuse the anthropomorphism). There is a deeper, darker, root to this racist output, though. GPT-3 knows what it “knows” from us. The data on which it is trained is data that we have produced. The connections that GPT-3 makes, and the outputs it produces, are not extracted from the ether: they are reflective of the kind of society that we live in. We live in a time when fear of others and fascist rhetoric are on the rise. We are currently experiencing an erosion of democratic governance and liberal values the world over, and GPT-3 is an algorithmic mirror to our own sad reality.

What this reveals is that a lack of thinking, “unthinking,” is not merely a machine issue. While from what I have said above it is clear that GPT-3 is not capable of thought, we human beings, at our worst, suffer a similar cognitive defect. Noting again the rise of fascist tendencies in Western democracies coupled with extremist communities spreading their own specific brands of hate on social media, there are perhaps some similarities to be drawn between an unthinking machine and an unthinking human. As Shannon Vallor writes

“Extremist communities, especially in the social media era, bear a disturbing resemblance to what you might expect from a conversation held among similarly trained GPT-3s. A growing tide of cognitive distortion, rote repetition, incoherence and inability to parse facts and fantasies within the thoughts expressed in the extremist online landscape signals a dangerous contraction of understanding, one that leaves its users increasingly unable to explore, share and build an understanding of the real world with anyone outside of their online haven.”

This again draws our attention the embedded nature of these socio-technical systems, and that, much like “understanding” itself, an analysis of these “intelligent” machines must always be cognisant of the wider social and political context. This view has been put forward by Timnit Gebru, who argues that our AI systems demand an intersectional analysis, and that no matter how “accurate” the behaviour of a given system might be, there is always the risk that such systems reinforce existing power hierarchies (as in GPT-3’s remarks about “What ails Ethiopia?” above).

Where does that leave us? There is no doubt GPT-3 is an important and ground-breaking AI system. This does not mean, however, that it understands anything at all. It is utterly mindless. “Mindless machines”, however, are hardly ever going to make good headlines, and so there is an incentive for the popular press to exaggerate or embellish the significance of various advances in AI research. I hope to have brought a sense of realism back to these discussions, and to caution against hyperbolic thinking in this context. This realism also suggests a kind of humility when it comes to evaluating the outputs of these systems. We need to tread carefully in how much epistemic weight we attribute to their outputs, especially in cases where the data is produced by systems trained on information created by a species with a less than stellar moral track record (us).