Corrigibility (a tendency for AIs to let humans change their values) is important

Scott Alexander at Astral Codex Ten:

Evolution was a “training process” selecting for reproductive success. But humans’ goals don’t entirely center around reproducing. We sort of want reproduction itself (many people want to have children on a deep level). But we also correlates of reproduction, both direct (eg having sex), indirect (dating, getting married), and counterproductive (porn, masturbation). Other drives are even less direct, aimed at targets that aren’t related to reproduction at all but which in practice caused us to reproduce more (hunger, self-preservation, social status, career success). On the fringe, we have fake correlates of the indirect correlates – some people spend their whole lives trying to build a really good coin collection; others get addicted to heroin.

In the same way, a coding AI’s motivational structure will be a scattershot collection of goals – weakly centered around answering questions and completing tasks, but only in the same way that human goals are weakly centered around sex.

More here.

Enjoying the content on 3QD? Help keep us going by donating now.