o3 and the Death of Prediction

by Malcolm Murray

Oops. During most of 2024, all the talk was of deep learning hitting a wall. There were secret rumors coming out of OpenAI and Anthropic that their latest training runs were disappointing. People were confidently stating that AI progress had now hit a plateau. Importantly, this was not just a few pundits that have bet their career on the current AI path being the wrong one, such as the Gary Marcuses of the world. It was all of tech and even mass media jumping on AI-hitting-a-wall meme.

And then what happened just at the end of the year? OpenAI announced its o3 model. A new model, with an unknown architecture, that achieves benchmark scores hitherto unimaginable. o3 achieves 88% on the ARC-AGI challenge, a benchmark designed to measure the efficiency of AI on novel tasks. o3 gets past 2,700 on Codeforces, making it comparable to the best developers in the world. Maybe most impressively, o3 gets 25% on FrontierMath, a benchmark created by Epoch AI just months earlier with the most fiendishly difficult math questions, where other AI models have literally had 0% correct (except o1, which had 2%).

The point of this post is not to argue that these are mind-blowing scores and that AGI has been achieved, as some were quick to do. There are always question marks in how the scores were achieved. For example, some people question whether having the model trained on some of the public ARC-AGI training set unduly influenced the results. Or perhaps, it only solved the easier questions in FrontierMath. But that is beside the point. The bigger point here is that, when it comes to AI evolution, prediction seems dead. As noted by Zvi Mowshowitz in his post on o3, no one saw this coming. Outside of OpenAI, the whole tech industry seemed all bought in on the AI has hit a wall meme. So what this means is that we need to stop focusing on trying to predict AI capabilities and instead focus on building resilience to AI risk. Rather than try to determine exactly when we will get specific AI capabilities, we need to start preparing society for its effects, so that the impact can be mitigated.

A lot of the focus over the past years has been to try to pinpoint when AI will be able to do certain tasks. Various surveys have been run estimating when AI will be able to write a best-selling novel or win math competitions. A key estimate that people have focused on is of course the timing of “AGI”, i.e. when AI will be able to match human intelligence. The concept of AGI was already becoming very fluffy – OpenAI recently apparently defined it as when they will generate $100 billion in profits (!) – and it now feels even less important. Speculating when or if AGI will arrive now seems more and more like a fairly meaningless parlor game. The real game is to prepare for advanced AI capabilities, whether they are called AGI or not. o3 shows two things clearly – that AI evolution seems set to continue apace, and that we can not predict it and should not attempt to.

Note that the death of prediction in the AI space does not mean the death of forecasting. Forecasting will still have its place. Whereas prediction is the non-scientific, crystal ball, finger-in-the-air activity beloved by media pundits, forecasting is a more scientific endeavor, which will still be valuable. Forecasting, especially in the form invented by Philip Tetlock – Superforecasting (disclosure: I am a Superforecaster) – means careful thinking regarding the applicability of historical base rates and adjusting them based on clear current trends. This can yield still very accurate forecasts of future events, at least a few years out.

But the fields of AI safety and AI risk management should turn their focus to resilience. Normally in risk management, risks are analyzed by their potential impact as well as their probability and their likely time to materialize. Focusing on resilience, however, means putting aside the probability and timeframe and focusing on the impact. This is a different mindset. It is saying that we don’t know if or when this risk will arise, but if the impact of the risk is large enough, we should make adequate preparations regardless. Preparation takes time, so it is high time to start. We can’t know exactly which capabilities AI will have by when, but it seems likely that in a few years, they will be large enough to be transformative. Society has a bad track record in preparing for risks – just witness the current lack of H5N1 preparations despite the recent scars from COVID-19. When it comes to climate change, there has been a vocal minority talking about deep adaptation for some time, with very little traction. It is therefore unlikely that any large-scale efforts to adapt society will be attempted until it is absolutely necessary and we already are staring down the impact of the risks. Only small, high-impact, low-cost efforts are likely to be politically expedient and socially palatable.

What could societal AI resilience look like? AI risk is multi-faceted, encompassing everything from deepfakes to drones, so AI resilience likewise will have to be multi-faceted. Part of resilience is building strength, but an equally important, if not more important, part is building optionality. You can fortify your fortress, but you should also construct a back-up fortress in a different location. We need to do both.

For some aspects, strength is needed. For example, we know that AI can enable cybercriminals and adversarial nation-states to conduct more sophisticated cyberattacks. Therefore, we should invest in cyber defense, especially for critical assets. It also seems likely that AI will enable terrorists in creating biological and nuclear weapons. For these, it is imperative that we increase the physical security around the tangible elements (viruses, wet labs, radioactive materials) needed for these weapons, in order to compensate for the increased likelihood that AI can help terrorists fulfill the intangible elements.

For other aspects, optionality is needed. In terms of effects on the labor market, we can truly not say what effects AI will have. Ten years ago, the advice from everyone was to learn coding since that was supposedly going to be the job of the future. We now have AI that is better at coding than all but a few humans. So that was perhaps not the best advice. On the flipside, Geoffrey Hinton told medical students that there would be no jobs in radiology by 2021. Also probably not the best advice. So the key will have to be to build optionality. For human labor, optionality can mean education, providing people with additional skills to more easily transition to different jobs. It could also mean providing them with the means to survive without a job, through means such as UBI (Universal Basic Income) or UBC (Universal Basic Capital). However, it might also require more radical solutions, perhaps Amish-style communities that choose not to use AI, or communities experimenting with completely different legal systems and societal structures, such as seasteading. Basically, at this point, we don’t know and we can’t know, so to start building resilience, all kinds of experimentation seems vital and should be encouraged.

Enjoying the content on 3QD? Help keep us going by donating now.