The Strawman Factory in the AI Risk Debate

by Malcolm Murray

There is a strawman factory smack in the middle of the AI risk debate. Given the complexity of AI risk, we are seeing a lot of weak arguments, focusing only on one aspect of AI risk or AI risk management. In fact, AI risk has become a bit of a “strawman factory”, since it is so complex that it is very easy to only zero in on one aspect and knock that down, while neglecting the rest.

The debate over the California AI bill SB-1047 showed how easily these strawmen take over. Andreessen Horowitz was particularly effective at churning out strawmen, such as the idea that the benefits of AI are so great (correct) so that we have a moral reason to ignore the risks (incorrect).

In order to fend off all these strawmen walking around, we can make use of three underappreciated aspects of AI risk and risk management – “the Spectrum”, “the System” and “the Stack”. Let me explain what I mean by those.

The “Spectrum”

Given the complexity of AI risk, it is easy to zero in on one aspect and point out that an AI risk assessment technique would not work there. But this neglects the wide spectrum of AI risks.

The risks to society from AI fall along a wide spectrum and people tend to underestimate how wide it is. The spectrum can be visualized in terms of levels of velocity and uncertainty, i.e. how long it will take a risk to materialize and how uncertain we are about its effects. On one hand, we have risks for which we have high certainty and that are already here, such as bias and discrimination. We know that by default, since AI has been trained on the sum total of human knowledge, and that human knowledge is inherently biased, that the AI models have biased data in their training set.

On the other hand, we have risks such as loss of control risks. These are further out and we have much less certainty on how they will impact us. Here, we are forced to make assumptions based on patterns in human society that might not necessarily hold true for other forms of intelligence. An example is that power-seeking would be a dominant instrumental strategy for almost any other goal. Then, in the middle of the spectrum, we have the risks of misuse and accidents, such as the use of AI to create biological weapons, cyberattacks or disinformation campaigns.

When focusing only on one aspect of this spectrum, it is very easy to put up strawmen that can justify inaction. For example, many commentators note that it is very hard to quantify risks of bias and discrimination. This is correct. But this should not be taken as an argument for it being very hard to quantify all AI risks. When it comes to the risks of AI-enabled cyberattacks, for example, there is much more data and ability to quantify. We are currently conducting a study assessing the level of AI cyber risk quantitatively.

The “System”

Early efforts at AI risk management is focusing on the capabilities of models. We have benchmarks that measure how good a model is a set of tasks, “evals” that test directly how good a model is for a certain objective, and “model cards” that describe the capabilities of a model.

However, these are largely technical assessments. AI risk is socio-technical, not purely technical. A full assessment of the risk needs to consider the social context in which the model is deployed. A risk assessment should take into account the initial hazard and then the system around it. In this case, the initial hazard is the model’s capability. But in order to understand the actual risk, that model capability needs to be put into the context of the wider system. That highly influences how likely and how impactful an actual risk event would be. It is not surprising that early efforts in AI risk assessment are technical in nature. AI is a highly technical field. What is surprising, however, is how with every new technology, there is the perceived need to reinvent the wheel.

In other fields, such as nuclear power, the field also started deeply technical, but over time learned the lesson that they need to integrate human factors and take a system approach. The dominant risk assessment approaches in other safety-critical industries now take a holistic, system-level approach. Risk assessment techniques that have been developed over decades in other fields, such as environmental risk assessments, enterprise risk management, and even financial risk assessments can be applied to AI risk. They’ll need to be customized, but that is not a reason to reject them wholesale.

An example strawman here is that AI is a black box, so therefore AI risk must also be a black box. We cannot attempt to model it. But that does not need to be the case. There are many aspects of AI risk that we can shed light on through proper risk modeling, since so many of the factors of AI risk lie outside of the LLM itself. For example, the risk of a terrorist developing a biological weapon using an LLM has a multitude of factors that are LLM-agnostic.

The “Stack”

The third “S” that gets forgotten is the “stack”. What I refer to as “the risk management stack” is the whole set of tools we have at our disposal in risk management. Risk assessment is only one component. It is what helps you understand what kind of risk you are actually dealing with. After the risk is assessed, there is risk mitigation, i.e. what do we do about the risk. Finally, there is risk governance, i.e. ensuring we have the right roles and responsibilities in place for decision-making regarding AI risk.

Strawmen here tend to disregard the stack. For example, on the more speculative side of the risk spectrum that I mentioned above, there is the risks of loss of control. It is correct to point out that we have no idea what will happen when an entity as intelligent as humans tries to achieve its goals. However, this is made into an argument for not attempting AI risk management at all. It is correct that for speculative AI risks like these, we should likely not even try to measure the probability or impact at this time. However, that point misses the wider stack. For those sorts of risks, we can put less emphasis on risk assessment, but this does not mean we need to forego risk mitigation and risk governance. Risk mitigation in this case might mean for example applying the principles of Defense-in-Depth, which is applied successfully in most other industries. We may not know the risk exactly, but that is all the more reason to analyze how we can put in place as many safeguards as appropriate.

Risk governance also retains its importance. It might arguably even be more important in the absence of a strong risk understanding. As we have seen over the past year at OpenAI, governance is tricky to get right. OpenAI’s initial structure was an interesting governance experiment, but ended up failing in the face of corporate incentives. This again shows the potential folly of reinventing the wheel. There are existing structures that can be used, and that is what we now see OpenAI reverting to.

Keeping in mind that AI risk needs to be analyzed as a system, that there is a spectrum of AI risks and that we have a whole stack of risk management tools at our disposal, will help fend off unwanted strawmen from the AI risk debate.

PS. I didn’t realize this until I had finished writing the post, but I now think the idea of all these strawmen walking around must have been somewhat influenced by a post that Étienne Fortier-Dubois wrote at his wonderful Atlas of Wonders and Monsters blog, so wanted to make sure I included a link to that post.

Enjoying the content on 3QD? Help keep us going by donating now.