Scott Alexander at the AI Futures Project:
Welcome to the AI Futures Project blog. We’re the group behind AI 2027, and we plan to use this space to go beyond the scenario – whether that’s speculating on alternate branches, announcing cases where we changed our minds, or discussing our methodology in more detail. Today we want to talk more about time horizons.
We’ve been accused of relying too heavily on extending straight lines on graphs. We’d like to think we’re a little more sophisticated than that, but we can’t deny that a nice straight line is a great place for a forecast to start. And METR’s Measuring AI Ability To Complete Long Tasks has some pretty sweet straight lines:

This graph tracks progress in the length of coding task that an AI can do with > 80% success rate. Task length is determined by the average human – so for example, GPT-4 had 80-20 odds of successfully finishing a task that a human could do in a minute; Claude Sonnet 3.7 has 80-20 odds at a task humans can do in fifteen.
More here.
Enjoying the content on 3QD? Help keep us going by donating now.
