Boaz Barak at Less Wrong:
The following statements seem to be both important for AI safety and are not widely agreed upon. These are my opinions, not those of my employer or colleagues. As is true for anything involving AI, there is significant uncertainty about everything written below. However, for readability, I present these points in their strongest form, without hedges and caveats. That said, it is essential not to be dogmatic, and I am open to changing my mind based on evidence. None of these points are novel; others have advanced similar arguments. I am sure that for each statement below, there will be people who find it obvious and people who find it obviously false.
- AI safety will not be solved on its own.
- An “AI scientist” will not solve it either.
- Alignment is not about loving humanity; it’s about robust reasonable compliance.
- Detection is more important than prevention.
- Interpretability is neither sufficient nor necessary for alignment.
- Humanity can survive an unaligned superintelligence.
Before going into the points, we need to define what we even mean by “AI safety” beyond the broad sense of “making sure that nothing bad happens as a result of training or deploying an AI model.” Here, I am focusing on technical means for preventing large-scale (sometimes called “catastrophic”) harm as a result of deploying AI. There is more to AI than technical safety.
More here.
Enjoying the content on 3QD? Help keep us going by donating now.