by Derek Neal
I started reading Leif Weatherby’s new book, Language Machines, because I was familiar with his writing in magazines such as The Point and The Baffler. For The Point, he’d written a fascinating account of Aaron Rodgers’ two seasons with the New York Jets, a story that didn’t just deal with sports, but intersected with American mythology, masculinity, and contemporary politics. It’s one of the most remarkable pieces of sports writing in recent memory. For The Baffler, Weatherby had written about the influence of data and analytics on professional football, showing them to be both deceptive and illuminating, while also drawing a revealing parallel with Silicon Valley. Weatherby is not a sportswriter, however, but a Professor of German and the Director of Digital Humanities at NYU. And Language Machines is not about football, but about artificial intelligence and large language models; its subtitle is Cultural AI and the End of Remainder Humanism.
Weatherby’s idiosyncratic popular writing gives us an idea of what to expect in Language Machines—not a dry scholarly book on AI (despite being published by a university press), nor a popular book meant to capitalize on AI hype or fear, but a unique analysis of what large language models (LLMs) like ChatGPT do that references linguistics, computing, and most interestingly to me, literary theory.
Weatherby has a few fundamental points that support his understanding of LLMs. The first is that LLMs produce language. This might seem obvious, but as Weatherby makes clear, some linguists insist that LLMs are not producing language. This is the view of Noam Chomsky, whom Weatherby uses to illustrate what he calls the “syntax” view of language, which proposes that language is located in the brain, meaning there is some underlying biological, cognitive structure that allows us to use language—a “universal grammar.” From this view, LLMs don’t produce language but rather a pale imitation of it, as they lack the underlying cognitive function that would give rise to language. This scientific view of language, which seems to me to be a sort of biological determinism, leaves us ill equipped to understand the output of LLMs, as all it can do is say, “that’s not language,” dismissing the matter as someone else’s concern.
In addition to the syntax view, there is also the “statistics” view, prevalent among some computational linguists, which says that we “know a word by the company it keeps.” Proponents of this view do believe that LLMs produce language, and they view LLMs as vindication for their view of language, since these models work by using statistical relationships among words to produce the next word in a sequence, generating sentences that are often indistinguishable from human language. Neither the syntax view nor the statistics view is satisfactory for Weatherby’s goals, however; if the syntax view ignores the output of LLMs in search of a natural source for language, then the statistics view focuses too much on the output of these machines, ignoring the search for an explanatory theory of language. In other words, it is content to say, “the purpose of a system is what it does,” but it doesn’t ask how the system works. Both views leave us unable to fully understand the production of LLMs, yet they are the two dominant views when it comes to understanding language—the syntax view representing cognitive science and the statistics view representing Natural Language Processing (NLP) and data science. What’s missing, according to Weatherby, is an approach to language that is humanistic rather than scientific, but the humanities “lost language” as a disciplinary object sometime in the 1990’s and have yet to reclaim it.
This shift is articulated well, albeit in a different, fictional way, in J.M Coetzee’s Disgrace (1999), when the main character’s career as a professor in the humanities is summarized:
He earns his living at the Cape Technical University, formerly Cape Town University College. Once a professor of modern languages, he has been, since Classics and Modern Languages were closed down as part of the great rationalization, adjunct professor of communications. Like all rationalized personnel, he is allowed to offer one special-field course a year, irrespective of enrolment, because that is good for morale. This year he is offering a course in the Romantic poets. For the rest he teaches Communications 101, `Communication Skills’, and Communications 201, ‘Advanced Communication Skills’. Although he devotes hours of each day to his new discipline, he finds its first premise, as enunciated in the Communications 101 handbook, preposterous: ‘Human society has created language in order that we may communicate our thoughts, feelings and intentions to each other.’ His own opinion, which he does not air, is that the origins of speech lie in song, and the origins of song in the need to fill out with sound the overlarge and rather empty human soul.
The “great rationalization” here is analogous to the shift Weatherby outlines from the humanities to the sciences, and the idea that language “communicate[s] our thoughts, feelings and intentions” is based on the idea that language is referential—first we have something inside or outside of ourselves, which exists before language, and then we convey that information in language. This explanation assumes that reference is the primary function of language—an idea Weatherby calls the “ladder of reference”—with other functions, such as the poetic function, built on top. Weatherby disagrees with this idea, claiming that LLMs show language to be poetic first and referential second; in other words, since LLMs cannot refer to the external, physical world—only its representation in language—an LLM can never escape language, meaning it can never be primarily referential. Yet LLMs do generate meaning, as anyone who’s interacted with ChatGPT knows. This makes it poetic, not in the sense that it writes poems, but in the sense of creating meaning through language. Coetzee’s character specifies that the human soul is “empty,” and that song, which gives rise to speech, fills it up. This is something the great religious and spiritual traditions have always understood—that language doesn’t just represent our world but creates it as well. In the Book of Genesis, God creates the world through language—first he speaks, then the world comes into being; in the Tao Te Ching, “naming is the origin of all particular things;” in epic poetry, the bard invokes the muse, telling her to inspire him with song, which allows him to relate a story of which he is not the author. In this understanding, language is not simply something humans use and control, but something that uses us as well.
These explanations of language may be unsatisfactory to the linguist, or the scientist, but to understand LLMs, we must take them seriously—not because AI is a supernatural being, as some of its proponents claim—but because an enriched understanding of language, one alive to its poetic potential, can help us interpret the output of LLMs. Weatherby references Emily Bender, a linguist who has become known to the public due to her labelling of LLMs as “stochastic parrots,” and notes that she, like Chomsky, claims that LLMs don’t produce language, because humans “impute meaning where there is none” when reading LLM generated text. But this is what humans have always done. Nothing has meaning—not a word or a piece of art—until we give it meaning through our attention. Communication and expression require a speaker and a listener, a writer and a reader.
If my explanations are starting to sound as if they belong to the realm of literary theory—with such ideas as “the death of the author,” or the idea that a reader constructs the meaning of a text as much as the writer does—they should, as Weatherby thinks we need to go in this direction to understand LLMs. More precisely, he thinks we need to return to the lessons of structuralism, the literary theory pioneered by Ferdinand de Saussure in the early 20th century. Structuralism treats language as a system of signs in which the relation of the part to the whole, or the word to the language, conditions meaning. What this means is that language is a closed system; if, for example, I use a word from another language while I’m speaking English, it won’t have any meaning, or at least it won’t have the same meaning that it does in its own language, even if it eventually becomes part of English as a loanword. The word itself has no positive meaning but becomes meaningful due to its difference from the other words in a language; outside of its structure, or system, it’s just a sound or a mark on a piece of paper.
What structuralism helps us to understand, if I’m following Weatherby correctly, is that language is always cultural. This is why the syntax view, which is focused solely on cognition, and the statistics view, which would reduce language to a number (Weatherby says this isn’t possible), are insufficient. “Data models always include culture,” writes Weatherby, so we need structuralism and a “semiotics of data” to understand and interpret the output of LLMs, which are not necessarily intelligent, he says, but are “language machines.”
The precise definition of intelligence is avoided by Weatherby, who instead notes that our understanding of intelligence must always be tied up with language, or signs. In Weatherby’s words, “intelligence itself has a formal-semiotic aspect,” and language is a “choke point through which other forms of meaning must pass,” or “language is not thought or meaning as such, but I cannot convey this fact outside of language.” We are trapped in language, just as LLMs are. Weatherby’s overall approach, and a point that he returns to throughout Language Machines, is that culture and cognition cannot be neatly separated, which I understand to mean, more or less, “nature” (cognition) and “nurture” (culture) cannot be easily distinguished from each other. If we agree with this, we are better prepared to understand LLMs as producers of culture rather than producers of “truth,” while also recognizing that our notions of “truth” necessarily contain elements of culture as well. This is also why Weatherby is opposed to what he calls “remainder humanism,” which is the idea that humans contain some special characteristics that AI will never be able to emulate. Even if this is true (and part of me wants it to be true), it’s a claim that is usually built on the separation of nature and nurture, on the idea that we can extract the human from their environment and locate an essence that exists a priori, untouched by culture. This prevents us from seeing how humans become human in culture, how things like writing and speech, by which we understand ourselves, are themselves cultural techniques.
At this point, Weatherby (and I) might be starting to sound like AI apologists, suggesting that LLMs are just another tool that humans can use to our advantage without posing any danger. This withholding of judgement, the fact that Weatherby doesn’t immediately frame his understanding of LLMs as a “booster” or “doomer,” is a strength of Language Machines, and it allows Weatherby to explain his view of LLMs without becoming mired in ideological debates. He does, however, eventually arrive at his concerns surrounding AI, and he is not optimistic.
In addition to his assertion that LLMs produce language, another of Weatherby’s key points is that they are adept at copying genre. Weatherby gives the example of GPT-2 (2019) responding to a prompt about researchers discovering a unicorn herd; the generated text contains internal contradictions, but it feels impressive because it’s written in the style, or genre, of a science report. Examples like this appear every few months—recently, James Marriott posted three paragraphs of a ChatGPT generated review of Martin Amis’ The Rachel Papers, claiming that “you could submit writing of this standard to a magazine and get published.” And he’s right, you could, but not because the paragraphs contain any penetrating insight into Amis; instead, it’s because the 800-1,200 word book review is a set form with certain rhetorical moves that must be made, making it easy to copy. The most prevalent use of ChatGPT—the student essay—falls into this category as well. When the product first went public in the fall of 2022, my initial experience with AI generated text was a student essay about an economic recession that had never occurred. Everything was in the right place for a typical five paragraph essay—the thesis, the topic sentences, the evidence, the analysis, the conclusion that restated the introduction in inverted form—but none of it added up to anything. The form was there, but the content was nonsense.
This is not to criticize the science report, or the book review, or the student essay for being formulaic, but to say that genre, and more broadly, form, is inextricable from content. How we say something shapes what we say, and vice versa. This is another reason structuralism is a useful lens for analyzing AI generated text, as it encourages us to pay attention to the form of the text—not necessarily what it says, but how it says it. Since content and form exist in this dialectical relationship, an awareness of form is essential when using language, making art, or attempting to articulate truth. The best art bears this out. Hamlet includes a play within the play itself; in Don Quixote, a character reinvents himself inside the novel as a fictional character; in Close-Up, the 1990 film from Abbas Kiarostami, a true story is recreated not by actors, but by the real people whom the events are about (they “play” themselves). All three artworks question the artform itself and in this way surface their construction, or their artifice, not in some sort of postmodern gimmick, but in a way that deepens their truth generating capacity as they make the viewer or reader realize that they, too, exist within a form or a structure that is culture and language.
Another way to put this, as Weatherby does frequently in Language Machines, is to refer to the mediating structure as the “symbolic order,” referencing Jacques Lacan’s three registers of the Imaginary, the Symbolic, and the Real. Weatherby clarifies what he means by the “symbolic” in noting that it is, “what cognitive scientists and anthropologists call ‘culture,’” that “the ‘symbolic order’ is often taken to be language in contemporary use” (Weatherby specifies that it should also include mathematics), and that “we are ‘inside’ the symbolic order…and so are constantly tempted to take it to be coeval with consciousness, or even with reality.” The key here is that the symbolic is not the real, but we might lose sight of this, and if we do, we become like those in Plato’s cave, mistaking shadows for reality.
For Lacan, there is no escaping the symbolic, because the symbolic is that by which we know ourselves and the world around us. This might seem shocking, or highly theoretical, but if we return to the example of art, I don’t think it is. Hamlet, for example, only speaks to us if we let it work on us, rather than dismissing it as “just a play.” We must suspend our disbelief and accept for the duration of the play that it’s real, but at the same time we must be prepared to be shocked out of this disbelief when Hamlet stages a play within the play. The same is true for all art, and maybe even all experience, which is why Weatherby mentions that Lacan “always insisted ‘learning’ is partly a matter of induction into the symbolic order.” This is an ironic truth, that we have to let ourselves be deceived in a naïve way before we can then become aware of the deception, when the symbolic is made visible.
The danger is if we don’t realize that we are always participating in the symbolic order, or in Plato’s version, if we don’t realize we’re inside the cave. Weatherby writes that “LLMs present us with something like the pure symbolic order,” and there are already many people who have mistaken its manipulation of language—of symbols—to be a reflection of the real. What it actually is, according to Weatherby, is an “ideology machine.” Weatherby writes that “the symbolic is the joystick of ideology,” meaning that the symbolic, or our representations of the world, controls what we are able to think and say. LLMs extend their own ideological version of the symbolic order, which could be thought of as “the quantitatively captured average over a given contextual domain,” as they proliferate and are integrated into all aspects of work and life. The end result could be something like “a cloak of words so thick that truth and untruth cannot be distinguished.”
But the symbolic order is also language and culture, and Lacan tells us that learning is a result of entering into it.1Dan Silver makes an interesting point in a recent article about the connection between learning and hallucinations: https://thesilverlining3.substack.com/p/diagramming-modernity-bellah-and How does one reconcile this dual nature of the symbolic, the fact that it is both enlightenment and deception? How does one avoid becoming a prisoner of ideology? By being aware that one is in ideology. If we think of genre or form as the “symbolic” when it comes to an artwork—that is, the rules shaping its content—the examples I’ve mentioned foreground the symbolic and their artists deeply consider how this shapes what is expressed in their art. Hamlet tells us it’s a play, Don Quixote tells us it’s a novel, and Close-Up tells us it’s a movie. Another example might be the incredible scene in Mulholland Drive when a singer performs on stage at Club Silencio; we are told before the performance that everything we see will be an “illusion,” and we understand that the singer is lip syncing to a recording, yet as we watch the performance along with the two main characters, we forget this—it all seems so real—until the singer collapses and the singing continues.
The opposite of art that is aware of its form is kitsch, which Weatherby sees as an example of ideology and defines as “predigested form” (referencing Clement Greenberg). This is, perhaps, what people mean when they call AI text and images “slop.” Kitsch and slop are unaware of how form conditions meaning; they reproduce given forms unreflectively and generate “the quantitatively captured average over a given contextual domain.” This sort of “average” language or “average” image generation, while useful for entertainment or certain forms of communication, struggles to say anything that arrives close to artistic truth or the real because it is unaware of its own construction, or how it exists in the symbolic order. In fact, it is usually intent on hiding its form rather than foregrounding its artificiality. It might be possible for an artist with access to the underlying technology of the LLM to create something real, true, or meaningful by using the LLM as an artistic tool, in the way a director uses a camera or a writer uses writing to interrogate form, but considering we usually have access to LLMs via a chatbot interface that has already been “tuned” by capitalist tech companies interested in profit, this seems unlikely.
There is a benefit to LLMs “present[ing] us with something like the pure symbolic order,” however, and it’s the possibilty to study the symbolical order empirically, according to Weatherby. In other words, LLMs “offer an unprecedented view into the linguistic makeup of ideology” because in their generated text, they allow us to see which words, concepts, and terms are statistically near other words, concepts, and terms in human language. Weatherby refers to these clusters of words around a topic as “semantic packages” and notes that they can produce surprising results. One example he provides is when he asks ChatGPT to summarize his colleague’s book on the Frankfurt school and mathematics. Weatherby notes that ChatGPT provides a good summary of the book but makes an error in insisting that the book’s thesis is that “mathematics is a social construct.” According to Weatherby, the book does not claim this, but he speculates that ChatGPT says this because the “Alt Right” blames the Frankfurt School and its associated thinkers for calling mathematics a “social construct” and bringing about “cultural Marxism.” Weatherby sees the combination of these ideas—mathematics, the Frankfurt school, social construction—as part of the “‘culture war’ semantic package.”
While we can’t say for certain if Weatherby’s explanation is correct, the general idea that “hallucinations” reveal something significant about which words and ideas have connections in our language is exciting, and it allows us to discover relationships among ideas to which we may have been blind. I think I may have recently found one myself. In an article about his own use of ChatGPT, Justin Smith-Ruiu reported asking the LLM to explain his Substack metrics, as he was confused by data showing that some users had opened a recent article of his more than 200 times. This seemed impossible. In response, ChatGPT explained that this didn’t necessarily mean someone had manually opened the article over 200 times, but it did mean they’d looked at it a lot, and the number could be understood as “attention exhaust,” or a “residual trace of someone caring about [his] work.”
When I read this, the term “attention exhaust” seemed to remind me of something. Hadn’t I read about this idea somewhere before? I went to Google and searched the term, but nothing came up. I couldn’t find a single trace of this term being used in this way, which I understood not as the fatigue of our ability to attend, but the leftover “attention” produced after a person clicked on something on the internet, like the exhaust blown out of a car. ChatGPT had written that “email client preloading, forwarding, or security scans” could all contribute to the number of “opens” Smith-Ruiu was seeing in his data; it seemed that these things were the “exhaust” that engagement with his article might produce.
After I thought about it some more, I realized that the term I was thinking of wasn’t “attention exhaust” but “data exhaust,” a term that had become known to the broader public due to Shoshanna Zuboff’s The Age of Surveillance Capitalism (2019). According to Zuboff, data exhaust is the data generated by our online activity—clicks, page views, likes, tweets, and searches, for example—which are then packaged by companies like Google and sold to advertisers. Zuboff is critical of the term because she sees it as a way for companies to claim a right to our activity through surveillance—by calling what we do on the internet “exhaust,” which Zuboff clarifies is “waste without value,” it allows for companies to claim our activity as their own, then profit off of it, even though it is said to be valueless. In short, “data exhaust” is a profoundly ideological term for Zuboff, and it’s one that should not be accepted.
The explanation of data exhaust sounds strikingly similar to that of attention exhaust, and one can imagine “attention” being packaged and sold in the same way that “data” is. I would speculate, as Weatherby did, that ChatGPT coined the term attention exhaust because on the “poetic heat map” (one of Weatherby’s analogies for how LLMs function) “attention” and “data” are statistically close to each other. Put another way, the semantic package for “data analytics” includes both “data exhaust” and “attention” in large quantities.
If I’m correct that “attention exhaust” is a novel term created by ChatGPT, we must use Weatherby’s idea of a “general poetics” to think about what it means. I’ve already speculated that it is connected to Zuboff’s understanding of “data exhaust,” and continuing from there we might also consider if we should regard this term with skepticism, as Zuboff does with data exhaust. I think we should, since attention itself is the main function under threat from LLMs. What I mean is that LLMs have the ability to “externalize attention” in the same way that writing was said to “externalize memory.”2“Attention” is also the term for how the transformer architecture works in GPT systems and is what makes generative AI possible, as outlined in the 2017 paper “Attention Is All You Need.” Jac Mullen, for example, has noted that one of the main uses of an LLM is to have it attend to something for us; when a person feeds a document into an LLM and asks for a summary, for example, they are letting the LLM attend so they don’t have to. When a person has an AI romantic partner, to use another example that is quickly becoming a reality, they are exploiting the attentive function as well—the AI has unlimited attention for the user, unlike a human, and they require no attention in return. In this view, the AI functions as a sort of genie that can fulfill any wish one wants.
Mullen, I think, is more optimistic than I am, and has written about how AI could give us “compressible exteriority,” by which he means “a way of seeing ourselves from the outside,” which is a foil to the “deep interiority” that literacy provided. Dan Silver has compared recent responses to AI to the reaction to writing in Plato’s Phaedrus, wherein writing is seen as dangerous for the very reason I mention above, as it externalizes memory and thus impoverishes one’s ability to remember, in a way similar to how I’m suggesting LLMs could reduce our attentional capabilities. Weatherby is less interested in projecting how LLMs will impact the future and more focused on articulating a way to understand what they produce, in the same way that one might analyze, critique, and interpret texts. This is what I have tried to put into practice here, attempting to understand why the LLM produced “attention exhaust” and what it could indicate, rather than simply accepting the term as a given.
I suggested at the beginning of this essay that “nothing has meaning—not a word or a piece of art—until we give it meaning through our attention.” What does it mean, then, if we offload this capacity to AI? I don’t know. I could have fed Weatherby’s book into ChatGPT, asked it to summarize it for me, to write this essay, even, and then I could have edited a few things here and there, maybe trained the LLM on my own writing so that the resulting piece “sounded like me,” but I didn’t do this. Instead I spent hours and hours of my life reading this book, writing this essay, struggling to understand ideas that I’m not an expert in and then attempting to explain them in a way that would make sense to the general reader. I won’t be paid any money for this essay, and it’s unclear how many people will even read it. None of this bothers me, however. You should imagine me happy.
Enjoying the content on 3QD? Help keep us going by donating now.
Footnotes
- 1Dan Silver makes an interesting point in a recent article about the connection between learning and hallucinations: https://thesilverlining3.substack.com/p/diagramming-modernity-bellah-and
- 2“Attention” is also the term for how the transformer architecture works in GPT systems and is what makes generative AI possible, as outlined in the 2017 paper “Attention Is All You Need.”
