Scott Alexander reviews “If Anyone Builds It, Everyone Dies”

Scott Alexander at Astral Codex Ten:

Most people in AI safety (including me) are uncertain and confused and looking for least-bad incremental solutions. We think AI will probably be an exciting and transformative technology, but there’s some chance, 5 or 15 or 30 percent, that it might turn against humanity in a catastrophic way. Or, if it doesn’t, that there will be something less catastrophic but still bad – maybe humanity gradually fading into the background, the same way kings and nobles faded into the background during the modern era. This is scary, but AI is coming whether we like it or not, and probably there are also potential risks from delaying too hard. We’re not sure exactly what to do, but for now we want to build a firm foundation for reacting to any future threat. That means keeping AI companies honest and transparent, helping responsible companies like Anthropic stay in the race, and investing in understanding AI goal structures and the ways that AIs interpret our commands. Then at some point in the future, we’ll be close enough to the actually-scary AI that we can understand the threat model more clearly, get more popular buy-in, and decide what to do next.

MIRI thinks this is pathetic – like trying to protect against an asteroid impact by wearing a hard hat.

More here.

Enjoying the content on 3QD? Help keep us going by donating now.