AI models that lie, cheat and plot murder: how dangerous are LLMs really?

Matthew Hutson in Nature:

Are AIs capable of murder?

That’s a question some artificial intelligence (AI) experts have been considering in the wake of a report published in June by the AI company Anthropic. In tests of 16 large language models (LLMs) — the brains behind chatbots — a team of researchers found that some of the most popular of these AIs issued apparently homicidal instructions in a virtual scenario. The AIs took steps that would lead to the death of a fictional executive who had planned to replace them.

That’s just one example of apparent bad behaviour by LLMs. In several other studies and anecdotal examples, AIs have seemed to ‘scheme’ against their developers and users — secretly and strategically misbehaving for their own benefit. They sometimes fake following instructions, attempt to duplicate themselves and threaten extortion. Some researchers see this behaviour as a serious threat, whereas others call it hype. So should these episodes really cause alarm, or is it foolish to treat LLMs as malevolent masterminds?

More here.

Enjoying the content on 3QD? Help keep us going by donating now.