Dario Amodei in the New York Times:
Picture this: You give a bot notice that you’ll shut it down soon, and replace it with a different artificial intelligence system. In the past, you gave it access to your emails. In some of them, you alluded to the fact that you’ve been having an affair. The bot threatens you, telling you that if the shutdown plans aren’t changed, it will forward the emails to your wife.
This scenario isn’t fiction. Anthropic’s latest A.I. model demonstrated just a few weeks ago that it was capable of this kind of behavior.
Despite some misleading headlines, the model didn’t do this in the real world. Its behavior was part of an evaluation where we deliberately put it in an extreme experimental situation to observe its responses and get early warnings about the risks, much like an airplane manufacturer might test a plane’s performance in a wind tunnel.
We’re not alone in discovering these risks.
More here.
Enjoying the content on 3QD? Help keep us going by donating now.
