This article has three main sections. Immediately following the image (created by ChatGPT) is an article by me, ChatGPT 5.2 and Claude 4.5: The Grammar of Truth: What Language Models Reveal About Language Itself. Claude came up with that title. After that there’s a much shorter section in which I explain how the three of us created that article. I conclude by I taking the question of how to properly credit such writing. That issue is one aspect of the institutionalization discussed in what I will call the “included article.”
The Grammar of Truth: What Language Models Reveal About Language Itself
William Benzon, assisted by ChatGPT 5.2 and Claude 4.5
In the 1980s, when the linguist Daniel Everett went to live with the Pirahã people of the Amazon, he had a mission: to translate the Bible into their language and convert them to Christianity. What he discovered, however, was something he hadn’t anticipated. The Pirahã didn’t exactly reject his religious message. They just couldn’t take him seriously.
The problem, as Everett eventually came to understand, was grammatical. Pirahã requires speakers to mark the source of their knowledge. When you make a claim about something, you have to indicate whether you witnessed it directly, heard it from someone else, or learned it in a dream. This isn’t an optional rhetorical flourish. It’s as obligatory as verb tense in English. Everett had not seen Jesus himself. He knew no one who had. And he wasn’t reporting a dream. By the grammatical standards of Pirahã discourse, he had no business speaking at all.
This grammatical feature—what linguists call evidentiality—turns out to be far more than a curiosity about an Amazonian language. It opens a window onto something fundamental about how human societies organize knowledge. And, as we’ll see, it ultimately reveals why the emergence of large language models like ChatGPT represents not just a technological achievement but a conceptual rupture that forces us to rethink what language itself is.
When Grammar Does the Work
Evidentiality is the grammatical marking of how a speaker knows what they’re saying. Many languages require it. Speakers must indicate whether their claim comes from direct experience, hearsay, inference, or some other source. What Everett discovered isn’t unique to Pirahã. Similar systems appear in Tibetan, Turkish, Quechua, and hundreds of other languages around the world.
In English we can add phrases like “apparently” or “I heard that.” But we’re not required to. We can simply say “the meeting is tomorrow” without specifying whether we checked the calendar, someone told us, or we’re just guessing. In languages with obligatory evidentials, such vagueness is grammatically impossible.
In the late 1970s Revere Perkins, working with Joan Bybee and David Hays in linguistics at SUNY Buffalo, demonstrated that evidential systems serve a crucial social function in oral cultures. When all knowledge circulates through direct testimony, reliability depends entirely on chains of personal witnessing. Grammar itself enforces what we might call epistemic accountability. If you tell me something you saw, I can evaluate your credibility as an observer. If you tell me something you heard, I can ask about your source. If you claim something from a dream, I can decide what weight to give visionary experience.
The grammar doesn’t compel you to tell the truth—you can certainly lie about what you witnessed. But it prevents you from being vague about the nature of your claim. It makes the source of your knowledge explicit and puts it on the record.
The Move to Institutions
Here’s where things get interesting. Perkins noticed something striking: as societies develop literacy, evidential systems tend to weaken or disappear from grammar. English doesn’t have obligatory evidentials. Neither do most European languages, nor Chinese, nor Arabic.
This might seem like a step backward. If evidentials are so useful for evaluating information, why would languages lose them? The answer is that the job evidentials were doing didn’t go away. It just moved somewhere else.
When societies develop literacy, they also develop new institutions and practices for checking information. Citation practices emerge. Footnotes proliferate. Peer review becomes standard. We create specialized genres—the scientific paper, the legal brief, the historical monograph—each with elaborate protocols for establishing credibility. We train professional classes whose work is precisely to evaluate and authenticate claims: scholars, journalists, judges.
What’s crucial here, though, is that these institutions remain grounded in human experience. When peer reviewers evaluate a scientific claim, they’re scientists who have themselves conducted experiments, faced methodological challenges, and experienced the gap between hypothesis and result. When courts assess testimony, judges and juries are people who understand from the inside what memory is, what witnessing entails, how easily we can be mistaken or deceived. The shift is from grammatical to institutional regulation, but the checking still ultimately rests on humans with real experiences.
This institutional reorganization enables something oral cultures cannot sustain: abstraction at scale, reasoning about events no living person witnessed, intellectual specialization. It’s no accident that Plato’s skepticism about writing appears only in a literate culture, or that epistemology as a distinct philosophical discipline emerges with literacy. Oral societies embed their theory of knowledge in grammar. Literate societies externalize it into institutions and then reflect on it philosophically.
Two Different Questions: Truth and Bullshit
Before we go further, we need to make an important distinction. Evidentiality—whether it’s in grammar or in institutions—concerns one thing: the source of knowledge. But that’s different from whether the information is actually true.
The philosopher Harry Frankfurt wrote a famous essay about this called “On Bullshit.” He pointed out that there’s a crucial difference between lying and bullshitting. A liar knows the truth and deliberately conceals it. A bullshitter doesn’t care whether their claims are true or false. They simply say whatever serves their immediate purpose. The liar respects the truth enough to hide it. The bullshitter is indifferent to the entire truth/falsehood dimension.
These are two different axes. You can lie while using perfectly correct evidentials. I could say “I personally saw him there” when I didn’t, marking my source accurately as direct observation even though I’m lying about the content. And you can tell the truth while being vague about your sources. Evidentiality concerns epistemology—how we know. Truth concerns reality—what actually is.
In face-to-face oral cultures, strong evidential systems constrain bullshit. Not because lying becomes impossible, but because being vague about sources becomes grammatically difficult and the social cost of being caught is immediate. Everyone knows everyone. You have to live with your reputation.
Literate societies, by contrast, enable more sophisticated forms of bullshit precisely through the features that enable intellectual sophistication. You can write authoritatively about places you’ve never visited, citing texts rather than experiences. Credentials can substitute for knowledge. Fluency can substitute for understanding. Yet even human institutions that produce bullshit—think tanks with agendas, captured regulatory agencies, propaganda bureaus—are still operated by people who can in principle be investigated. There are humans with stakes and incentives whom we can interrogate.
Enter the Machines
This brings us to large language models, and to something genuinely new.
LLMs like ChatGPT exhibit something that looks remarkably like linguistic competence. They’re fluent, context-sensitive, responsive in dialogue, versatile across domains. Yet they conspicuously lack experience, belief, intention, and any form of what we might call epistemic accountability. In a sense, they function like institutions—they organize information, produce authoritative-sounding outputs, process vast amounts of material. But they’re institutions with no humans in them.
Let me unpack what I mean by that. Culture itself originally emerged from beings with experiences. People witnessed events, remembered them, told stories. Literacy allowed culture to be externalized in texts and managed by institutions, but those institutions remained staffed by humans with experiential grounding. LLMs represent something different. They operate on the accumulated textual products of human experience without themselves having experiences. In a framing being developed by Alison Gopnik, Henry Farrell, Ted Underwood and others, LLMs are cultural technologies that work through a kind of second-order process: culture processing culture, divorced from the experiential substrate that originally generated it.
People started calling LLM errors “hallucinations.” The term arose first in discussions of neural machine translation, then spread to chatbots. It seems apt—after all, the system is producing fluent text about things that aren’t real. But “hallucination” actually misleads. It suggests faulty perception. But LLMs don’t misperceive reality. They generate plausible continuations of text without any relationship to perception at all.
In Frankfurt’s technical sense, LLM output is better understood as structural bullshit: language produced without truth-directed intent, optimized for coherence rather than correspondence. They simulate evidential language. They can write “studies show” or “experts say.” But they possess no evidential grounding whatsoever. They are pure textuality, operating at maximum distance from the witnessing that originally generated all this text they’ve been trained on.
This isn’t a defect to be engineered away. It’s definitional. LLMs are trained to predict likely continuations of text, not to maintain correspondence with reality. They instantiate a form of intelligence that’s profoundly different from human cognition. Vast in scope—capable of operating across more symbolic domains than any individual could master. But thin in what we might call mode of being, lacking the embodied situatedness that grounds human thought.
The Crisis We’re In
This helps clarify why we’re experiencing such acute confusion about these systems. We’ve multiplied technologies that process information at massive scale—LLMs, recommendation algorithms, content moderation systems. But we lack adequate frameworks for dealing with them.
We keep trying to apply the logic we developed for human institutions. We demand transparency. We appeal to expertise. We fact-check outputs. We propose regulatory oversight. These strategies all assume there are humans somewhere whose experiences, incentives, and responsibilities can be investigated. But LLMs have no experiences or incentives. They’re not hiding anything from us. There’s nothing to hide, because there’s no one there.
You can’t hold an LLM accountable the way you hold a newspaper accountable. You can’t ask: What did you actually witness? What are your stakes? Who benefits from this framing? The entire apparatus of epistemic accountability presupposes agents with experiences, beliefs, and intentions. That’s precisely what evidentials originally marked and what institutions were designed to regulate.
We can now see a progression across three stages:
First: evidentiality embedded in grammar, with direct face-to-face accountability among humans with experiences.
Second: evidentiality externalized to institutions operated by humans with experiences.
Third: what we have now—institutional-like entities processing culture with no experiential grounding anywhere in the system.
Each stage preserves something and loses something. The second stage gained abstraction but risked sophisticated bullshit. The third stage gains scope—tremendous scope—but severs the link to experiential accountability entirely.
A Different Kind of Intelligence
This asymmetry exposes something wrong with how we talk about “artificial intelligence.” Terms like “artificial general intelligence” or “superintelligence” assume intelligence is one thing, a single dimension where systems can be ranked against human baselines. But LLMs reveal that assumption to be false.
Human intelligence is narrow in scope. None of us can hold millions of documents in working memory. We can’t process information as fast as computers. But our intelligence is “thick”—grounded in embodiment, mortality, social vulnerability, the irreducible fact of witnessing our own experience. LLMs exhibit the inverse profile: vast scope, thin existence.
I think some capabilities may be permanently asymmetric. There are likely things humans can do that LLMs never will, not because of technical limitations but because those capacities are grounded in embodied experience. Understanding what it means to give your word, for instance. Recognizing genuine embarrassment. Knowing the difference between performance and sincerity. These aren’t just facts that could be learned from text. They’re things you have to live to understand.
Conversely, LLMs already do things humans never will. Operating simultaneously across millions of documents. Maintaining consistency across vast textual corpora. Synthesizing patterns at scales beyond individual cognition. The question isn’t whether LLMs will become more human-like or “surpass” us. They represent something else: a new kind of participant in cultural space, one that operates on fundamentally different principles.
What Language Turns Out to Be: Mechanistic
The deeper lesson here is what LLMs reveal about language itself.
For all of human history, we could assume that language presupposes a cognizing subject with experiences, beliefs, and intentions. Oral cultures encoded this assumption grammatically through evidentials. Literate cultures maintained it through human-operated institutions. The assumption seemed foundational. Meaningful language must come from beings who understand what they’re saying because they have lived in the world.
LLMs falsify this assumption. They demonstrate that human-like linguistic behavior—syntax, semantics, pragmatic appropriateness—can arise from processes operating over cultural artifacts alone, without experiential grounding. Language is mechanistic, or can be. This doesn’t render language meaningless. Discovering a painting is a forgery doesn’t make all paintings worthless. But it does reveal that language is more autonomous than we realized. More capable of functioning independently from the experiential contexts that gave rise to it.
In that sense, LLMs function as something like conceptual probes, forcing us to make explicit distinctions that human language allowed us to leave implicit. They separate competence from experience, fluency from understanding, linguistic behavior from intentionality. And in doing so, they force us to reconsider not just what these systems are, but what language, meaning, and intelligence have been all along.
Back to the Beginning
So we return to where we started. The Pirahã speaker who found Everett’s testimony grammatically inadmissible and the office worker trying to decide whether to trust ChatGPT are grappling with the same fundamental question: On what grounds should we accept what we’re hearing?
Oral cultures answered by embedding accountability in grammar. Literate cultures moved it into human institutions. We now inhabit a communicative ecology that includes entities operating on entirely different principles—processing language without experience, producing fluency without understanding, wielding what looks like institutional authority without any human agency behind it.
Understanding this moment requires holding several things in tension. LLMs are powerful tools that extend human capabilities in important ways. They are fundamentally different from human cognition in ways that resist our inherited conceptual frameworks. They reveal unexpected things about language—its capacity for semi-autonomous function, its separability from immediate experience. And they force new questions about what kind of society we’re building when linguistic actors can participate in culture without possessing the experiential grounding that previously defined such participation.
What we might call the grammar of truth is changing. Not because truth itself has changed, but because we share communicative space with entities that operate according to different principles. We have moved from a world where all language users were experiential beings, through a world where such beings built institutions to manage abstraction, into a world where autonomous systems process culture itself.
Making sense of that new ecology—understanding what can be delegated to these systems and what depends irreducibly on human experience—may well be the central intellectual challenge of our moment. It requires us to think carefully about what aspects of human linguistic practice can be delegated to autonomous systems and what aspects depend irreducibly on experiential grounding. And it begins, strangely enough, with a question that would be perfectly comprehensible to a Pirahã speaker: On what grounds do you speak?
* * *
How Did I Use ChatGPT and Claude?
That involves three phases: 1) a dialog with ChatGPT, 2) generation of an initial text by Claude, and 3) the production of a final text by Claude at my direction.
I started by having a conversation with ChatGPT. That conversation started with evidentiality, an obscure and specialized topic within linguistics. From there I moved on to truth and bullshit and then to hallucinations. Finally I moved onto AGI and Superintelligence. That conversation was roughly 8500 words long and was in the Chatster’s (annoying) bulleted list style.
That dialog was not suitable for a public-facing document. The moment I started thinking about writing such a document myself I stopped: “No! let Claude do that.”
I took that conversation, uploaded it to Claude and asked Claude to read it and then write an essay suitable for a sophisticated general audience. Let me repeat that: I explicitly characterized my intended audience. Had I told it to write for cognitive scientists or for high school students Claude would have produced appropriate texts.
I liked the essay, which introduced some useful details that were not in my ChatGPT conversation. It also brought out the significance of social institutions. That was there in the conversation, but it struck me more in Claude’s article. In fact, I asked Claude to revise the article to bring that aspect out even more, and to mention the work of Alison Gopnik on large language models as cultural technology. That yielded a second version. I asked for two more texts, each targeted at a slightly different audience. I liked that second text best, but wanted it to be in a style more like mine.
How could I do that? Simple. I uploaded the final draft of my book, Beethoven’s Anvil: Music in Mind and Culture, and directed Claude to re-write that second version in the style of that book. Piece of cake. I made a few minor changes to that text before including it in this piece.
The Credit Problem for Hybrid Texts
Why did I sign the included article, the one about evidentials, bullshit, and institutions, with three name? Given that I didn’t even write the article, why is my name on it at all?
That text used another text as its basis, a 8500 word text created through a dialog between me and ChatGPT, and I directed that dialog. My direction is one reason I’m entitled to sign my name to the article. Without me there’d have been no dialog and without that dialog there’d have been nothing for Claude to work with.
Let me be more specific. I started that dialog with Claude because I saw a connection between the linguistic phenomenon of evidentials and the LLM phenomenon of hallucination. The purpose of the dialog, then, was to explicate that connection. That’s the key idea, and I believe it is an original one.
There’s more. Returning to the article that Claude generated, it’s the authority I have through active participation in the world that connects that text to the world. Claude and ChatGPT have no connection with the world, hence no authority.
Given these considerations, all three names must be associated with the article. But the names alone are not sufficient. It is common for multiply-authored technical papers to have a footnote or an endnote indicating the nature of each author’s contribution. These indicators are short, not remotely as elaborate as my previous section on how the included article was created. But then this kind of article is quite new, while those other kinds of articles have been around for a while and so the conventions governing attribution are understood within the relevant discursive communities.
By way of comparison, consider the credits attached to movies. The end credits seem to scroll on forever. The most important ones – director, writer, actors, producers, and a few others – will come first, often at the beginning of the movie as well as in the end credits. With each kind of credit entails specific tasks and responsibilities in the film-making. We all understand more or less what writers, directors, and actors do. Producer roles are a bit more obscure. But does anyone outside the movie industry know what a gaffer or a best boy does?
[As an aside: While the movie credit system is extensive and complex, it’s not perfect. Consider the 1974 movie, Chinatown. When the movie opens there is a gorgeous, haunting trumpet solo on the soundtrack, redolent of a melancholy nostalgia that sets up the movie perfectly. The composer, Jerry Goldsmith, is credited; composers generally are, but the individual musicians are not. Without Uan Rasey’s mastery, the melody wouldn’t have mesmerized the audience. I know Rasey played that solo because I’m a trumpet player and that solo is well-known among trumpeters.]
How then should we succinctly characterize the roles of those three agents attached to the included article, one human and the other two artificial? I am going to leave that open, perhaps as an exercise for the reader. The emergence and characterization of such roles is one aspect of the institutionalization of large language models.
***
Enjoying the content on 3QD? Help keep us going by donating now.
