How Do Machines ‘Grok’ Data?

Anil Ananthaswamy in Quanta:

In January 2022, researchers at OpenAI, the company behind ChatGPT, reported that these systems, when accidentally allowed to munch on data for much longer than usual, developed unique ways of solving problems. Typically, when engineers build machine learning models out of neural networks — composed of units of computation called artificial neurons — they tend to stop the training at a certain point, called the overfitting regime. This is when the network basically begins memorizing its training data and often won’t generalize to new, unseen information. But when the OpenAI team accidentally trained a small network way beyond this point, it seemed to develop an understanding of the problem that went beyond simply memorizing — it could suddenly ace any test data.

The researchers named the phenomenon “grokking,” a term coined by science-fiction author Robert A. Heinlein to mean understanding something “so thoroughly that the observer becomes a part of the process being observed.”

More here.