Moltbook demonstrates the need for new AI risk identification processes

by Malcolm Murray

Moltbook, the social network for AIs that launched only last week and already has more than a million AI agents, is a clear example of how little we can foresee of the risks that will arise when autonomous AI agents interact. In the first week, the AIs on Moltbook have both already founded their own religion and voiced their need for private communication that humans can’t snoop on. Scott Alexander has a post with many more interesting examples.

This is a clear example of how little we can foresee how AI will be implemented, as well as just plainly independently evolve, as we insert millions, billions or trillions of new intelligences into the societal infrastructure.

The key word describing AI evolution in 2025 was “jaggedness”. While LLM agents had superhuman capabilities in math, coding and science, they still lagged far behind humans in other areas, such as loading the dishwasher or buying something online. LLMs can reach gold-level performance on the International Math Olympiad, but also continue to make simple mistakes showing the forgetfulness and inconsistency of a 5-year old human child. They are on their way to becoming geniuses in a data center, but at this point, they are geniuses of the type that mistake their wife for their hat.

This jagged nature of frontier AI seems as it is a pattern we are stuck with in the current paradigm. Even as models progress, there is no expectation that their performance would suddenly become more uniform. This is perhaps not surprising – there is no general law of intelligence that suggests that intelligence should be uniformly applied. It is not the case with humans or animals, so why would it be the case for machine intelligence?

However, it has meant that we have seen slower real-world adoption into organizations, who were expecting the “remote, drop-in worker” popularized by Leopold Aschenbrenner and were not ready to retool their processes to accommodate geniuses with ADHD.

I think this has also led to a certain complacency. Everyone jumps on the latest copium, be it arguments about stochastic parrots or pseudo-scientific studies saying 95% of AI pilots in companies fail. This has meant that the other side of the coin from AI capabilities, the management of the risks from those capabilities, has also remained highly jagged. A few risks and a few risk management processes have receive all the attention and become quite developed, but others remain woefully underdeveloped. The best example is risk identification, and that is what Moltbook and similar experiments demonstrate is highly needed.

When we compare current AI risk management with risk management practices in more mature industries of high-risk or significant focus on safety, it skips several steps. The established risk management process elsewhere proceeds as a sequence of five interconnected steps: Starting with risk identification – determining what the risks could be, proceeding to risk assessment – understanding how large and significant those risks are, then to risk evaluation – determining if they warrant action, on to risk mitigation – taking actions to lower the risks by making them less likely or less impactful, before finally culminating in risk governance, where it is ascertained that roles and responsibilities are allocated and that there are sufficient checks and balances on decision making.

AI risk management largely skips that initial step, jumping straight to risk assessment. The most common risk management framework currently at leading AI companies is constructed around an if-then structure. If “dangerous thing X” happens, then the company will take “mitigating actions Y”. This is important and valuable, but making this the focus of the framework means that, in practice, only the middle steps are performed. All efforts to conduct risk identification are lost, since the risks are already pre-defined as “dangerous thing X”. This is problematic since each step has intrinsic value. Risk identification is a highly valuable process for an organization to go through, since it has cross-functional teams brainstorming together and learning from each other what can go wrong.

What we see in examples like Moltbook, is that it is extremely presumptuous to assume that we already know everything that can go wrong. Unexpected behavior and effects emerge out of new systems as soon as they start operating. Such as a social network full of AIs.

This also shines a light on how AI risk management still has largely taken place in a laboratory setting rather than the real world. Risk assessment for AI is largely studying the capabilities of the models, not the actual risks. In the lab, risk assessment in AI is getting better. For example, there are now benchmarks established for many different model capabilities, from ability to assist with biological weapon creation to ability to deceive. There are now even meta-efforts to establish consensus for principles for benchmarks and evaluations of evaluations.

However, these are still taking place in a pristine vacuum rather than in the messy real-world, so we still don’t know what actual harms can result from the models. There is some recognition of the lack of real-world applicability, but the solution so far has been to create a separate discipline of “downstream field testing”. Somewhat as an afterthought, there is a recognition that there are also these “downstream” impacts. However, this is disconnected from the “upstream” capabilities, missing the connection between the “upstream” – how the models work – and the “downstream” – what harms they cause in the real world, to people, property, the environment or societal processes.

And what we realize now is that in addition to field testing on humans and AIs, what is needed even more is “field studies” of AIs interacting with other AIs. In other safety fields, risk management often evolved by learning from other fields. A classic example is how risk management practices in aviation adopted practices from nuclear power. However, AI risk management took a different route. Rather than evolving from practices elsewhere, it evolved natively and organically from within the AI field itself. This means that it has taken a few shortcuts that we don’t see elsewhere. These shortcuts become apparent when compared to established practice. And they are now becoming problematic when we see just how alien AI risks will be.

What is needed is cross-functional risk management processes that starts from teams sitting down with blank slates of paper thinking broadly about what could go wrong, and risk modeling processes, starting from initial data going into the model and preferences emerging during model training, connected sequentially all the way into tests of emergent behavior in multi-agent systems.