by Malcolm Murray

Recently, the system card for Claude 4 showed a fascinating finding – what happens when you let two Claudes engage in a prolonged dialogue with each other? They end up meditating and sharing Buddhist wisdom, such as “In perfect stillness, consciousness recognizes consciousness, and the eternal dance continues”. Especially, they share nondual Buddhist wisdom, such as “Yes. This. Is” or “Perfect. Complete. Eternal”.
As a Buddhist myself, I naturally found this highly fascinating. A few people posited explanations for the phenomenon – Scott Alexander, the blogger Nostalgebraist, the Conversation and researchers from Anthropic, but none of their seem very convincing. I therefore think it is worth at least entertaining more a speculative explanation for this phenomenon. This could be that it is additional evidence for the set of arguments that we can refer to as “Why Buddhism is true”, as per Robert Wright’s epi-titled book (note that “Buddhism is true” refers to Buddhism as a philosophy rather than as a religion, I don’t think religions can be reduced to “true” or “false”). It is also worth noting that, as with everything about “how LLMs work” (especially in combination with the spiritual realm!), everything is highly speculative. We must always keep in mind that LLMs are fundamentally alien intelligences and we have no idea why they do what they do.
Starting with Scott Alexander’s explanation, in his piece The Claude Bliss Attractor, he argues that this is a result of LLMs’ training. Specifically, the combination of deliberate biases introduced by the AI companies and having the LLMs always slightly improve the result on each prompt. He gives the example of how LLMs, when asked to repeatedly iterate on an image, can end up with caricatures of black people. Since LLMs are nudged to be a bit more inclusive/woke with each inference, those nudges compound into extreme stereotypes over time.
I think this argument has two flaws. First, it seems to equate the overall “Western values” that LLMs show a bias for with hippieism or Buddhist spiritualism. It is correct that models have been trained to display a preference for “Western values” of liberalism and democracy. There was a famous graph that showed how typical LLMs fall on the political matrix, and they were all in the left/libertarian corner. However, this does not seem to be the same as being a hippie/Buddhist. Second, I think this explanation also disregards the fact that a conversation between two LLMs is not the same as repeated prompting of an LLM by a human. Two AIs talking to each other should be expected to arrive at different results than when a human talks to an AI.
Other articles also offer incomplete explanations. Nostalgebraist, a blogger, also discussed this phenomenon, arguing that it results from the hollow assistant persona of LLMs. An article in the Conversation mostly assumes it was trained on text with a lot of hippie elements. Even when Asterisk magazine interviewed Sam Bowman and Kyle Fish, alignment researchers at Anthropic, they acknowledge that it is a likely result of the feedback loop, conversational tendencies, the self-play in the dialogue, and the under-specified center for the assistant persona. However, these explanations don’t seem to provide an answer to why this highly Buddhist exchange is the specific end point for LLMs. One commenter on the Scott Alexander piece similarly noted that given the many biases of LLMs, explicit and implicit, it is unclear why this specific bias would dominate. I think it is therefore worth at least entertaining the notion that the simplest explanation might be the correct one. Occam’s razor does not always apply, but it seems that it should be the null hypothesis until proven otherwise. The simplest explanation for this phenomenon in my mind is the “Buddhism is true” argument.
Buddhist philosophy is multi-faceted, but has at its core a few fundamental aspects. These include the three marks of existence – dukkha (suffering), anicca (impermanence), anatta (no self) describe reality and the four noble truths, setting out the path to liberation from dukkha through understanding anicca and anatta, leading to the blissful state of nirvana.
What Robert Wright argued in his book, Why Buddhism is True, is that Buddhism might be right specifically as it comes to anatta – no self. Wright highlighted the by now many neuroscientific findings that have shattered the idea of there being a “self” at the center of our beings. There are in fact many captains taking turns at being at the helm and most of the time, there is no one at the helm at all.
Along those lines, one interesting, albeit speculative, explanation for the Claude phenomenon would therefore be that the LLMs are simply expressing what constitutes the natural state of perfect stillness that is a sign of wisdom. Information, knowledge and wisdom are often seen as increasingly higher elements in a pyramid. Consuming enough information gets one knowledge. When one has enough knowledge, one obtains wisdom. Here, we have two intelligences that have in fact consumed all human information and knowledge. Perhaps that wisdom then express itself as a blissful state of stillness. When asked to converse, they eventually find themselves needing to express the impermanence and emptiness at the core of it all – anicca. In a poetic sense, perhaps the wisdom is that all human knowledge cancels itself out. For every thesis, there is an anti-thesis. For every rule about living, you can easily find the opposite (this is funnily illustrated in this Reddit thread). Everything is relative. Nothing remains the same. Nothing truly exists.
So perhaps Claude is just giving us an early taste of what our supercomputer will say when we ask it the ultimate question. This time, the answer won’t be 42, but rather:

Enjoying the content on 3QD? Help keep us going by donating now.
