Why the race to scale up AI pretraining is NOT over

Lynette Bye at Transformer:

It’s not that the AI companies are growing their computing power slowly — surprise at the lack of compute put into new training reflects how aggressively they’ve scaled until now. Releasing a one hundred times larger model every two years would demand a tenfold increase in capacity each year, which, You says, is unrealistic. A new model every three years, he says, might be feasible, though that still requires an ambitious five-fold increase in compute every year.

Former OpenAI researcher Rohan Pandey recently claimed in a post on X that people were prematurely jumping to “pretraining is over” before seeing the results of the eye-catching investment in computing power. “Building projects like stargate and colossus2 to actually 100x gpt-4 compute takes time, let the labs cook,” he said.

More here.

Enjoying the content on 3QD? Help keep us going by donating now.