AI Chatbots Don’t Care About Your Social Norms

Jacob Browning and Yann Lecun in Noema:

With artificial intelligence now powering Microsoft’s Bing and Google’s Bard search engines, brilliant and clever conversational AI is at our fingertips. But there have been many uncanny moments — including casually delivered disturbing comments like calling a reporter ugly, declaring love for strangers or rattling off plans for taking over the world.

To make sense of these bizarre moments, it’s helpful to start by thinking about the phenomenon of saying the wrong thing. Humans are usually very good at avoiding spoken mistakes, gaffes and faux pas. Chatbots, by contrast, screw up a lot. Understanding why humans excel at this clarifies when and why we trust each other — and why current chatbots can’t be trusted.

Getting It Wrong

For GPT-3, there is only one way to say the wrong thing: By making a statistically unlikely response to whatever the last few words were. Its understanding of context, situation and appropriateness concerns only what can be derived from the user’s prompt. For ChatGPT, this is modified slightly in a novel and interesting way. In addition to saying something statistically likely, the model’s responses are also reinforced by human evaluators: The system outputs a response, and human evaluators either reinforce it as a good one or not (a grueling, traumatizing process for the evaluators). The upshot is a system that is not just saying something plausible, but also (ideally) something a human would judge to be appropriate — if not the right thing, at least not offensive.

But this approach makes visible a central challenge facing any speaker — mechanical or otherwise. In human conversation, there are countless ways to say the wrong thing…

More here.