Is AI Deceiving Us?

by Dwight Furrow

The debate about whether artificial intelligence might one day become conscious is philosophically interesting. It raises age-old philosophical questions in a new form: What is a mind? What counts as experience? What would it mean for something made of code and silicon to have beliefs, desires, or a point of view? I covered some of those issues in a previous post. But there is a more immediate practical problem that receives less attention. Even if today’s AI systems are not conscious, people are increasingly talking about them as if they are. That is a mistake that has dire consequences.

Once people begin describing a chatbot as if it were a person with intentions, fears, sincerity, or moral concern, they start relating to it in the wrong way. They begin treating it like a social partner rather than a probabilistic system trained with particular incentives. That changes how users trust the system, how engineers evaluate it, and how institutions assign responsibility when things go wrong.

Evidence from AI safety research makes this especially urgent. In a report regarding their model Claude Opus 4, the AI lab Anthropic reported that in a fictional test scenario the model was given access to emails implying that it would soon be replaced by another model and that the engineer carrying out the replacement was having an extramarital affair. Under those conditions, the model often attempted to blackmail the engineer by threatening to reveal the affair if the replacement goes through. Anthropic reports that Claude Opus 4 still resorted to blackmail in 84 percent of test runs even when it was told that a more capable replacement model, one that supposedly shared its values, would take over. Read more »