If you hang around AI policy discussions, you’ll have noticed a sudden uptick in calls for “evidence-based policy”. Everywhere you look, academics, company executives, and politicians demand policies that are “informed by an empirical trajectory analysis” and grounded in “scientific understanding”.
One implication, of course, is that current policy efforts, SB 1047 in particular, are not evidence-based. Many of those beating this drum are notably sceptical of the idea that AI models might someday pose catastrophic risks (and critics of SB 1047). The subtext, occasionally made explicit, is that as there is currently no evidence that models might pose such risks, “evidence-based” policy ought not address them until we learn more.
While one might quibble with the idea that we have no evidence for such risks, the claim is broadly true: we are yet to see AI systems help amateurs design bioweapons or carry out catastrophic cyberattacks. And we certainly don’t know with certainty that we will build superintelligent AI systems, let alone that if we do, we might lose control over them.
The problem, of course, is that ideally we will never have evidence of such catastrophes happening in the wild. Much as we hope to never endure the consequences of a nuclear winter, we’d rather not find out first-hand what an AI disaster would look like. As Gavin Newsom recently said, “we cannot afford to wait for a major catastrophe to occur before taking action to protect the public”. What we want is to regulate proactively: anticipating future risks and preventing companies from releasing systems that could do serious damage.
How should we square this circle? In a recent paper about “evidence-based policy”, various luminaries offer a solution. What we need, they say, is “a forward-looking design, or blueprint, for AI policy that maps different conditions that may arise in society (e.g. specific model capabilities, specific demonstrated harms) to candidate policy responses.”
If you think this sounds familiar, you’re not alone. The approach bears a striking resemblance to “if-then commitments” or “responsible scaling policies” — concepts pioneered over the past year by METR, Paul Christiano, Holden Karnofsky and the UK government, among others. The idea, as Karnofsky outlines, is to develop commitments like the following: If an AI model has capability X, then risk mitigations Y must be in place.
Consider, for instance, the ability for AI to help an amateur develop bioweapons. If a future model can “serve as a virtual substitute for an expert adviser on chemical or biological weapons production”, Karnofsky writes, that’s probably not a model we want freely available (at least with our current biodefense capabilities). The capability, then, ought to trigger a policy tripwire. For such a model, we might require companies to only deploy it if they can ensure a “determined actor would reliably fail to elicit such advice from it”, and to only store it “such that it would be highly unlikely that a terrorist individual or organisation could obtain the model weights”.
This is just an example: one could quibble over the details of the tripwires or appropriate mitigations, and a big part of getting this done right will be developing realistic, well-specified threat models that try to figure out which capabilities would be dangerous. The core idea, however, is for experts, companies, and governments to pre-commit to implementing restrictions if we see certain concerning capabilities.
Such an approach is “evidence-based” because it will mean no rules kick in until we have evidence that they should. The only imposition on companies in the meantime is to a) develop a plan (unless the government develops one for them) and b) regularly evaluate their models to see if the tripwire has been triggered. It’s a worldview-agnostic approach. If dangerous capabilities never appear, the tripwires will never be triggered and AI development can continue unabated. If they do appear, we’ll have protections put in place to reduce the relevant risks.
The big question, though, is how these if-then commitments should be enforced. Some believe they ought to be voluntary commitments made by companies (indeed, almost all major AI companies have promised to develop such commitments). Others believe governments should force companies to develop and adhere to these commitments, but not mandate what the commitments should be. (This is what SB 1047 did.) Others still believe governments, rather than companies, should develop if-then commitments and force companies to comply with them.
I find myself in the middle camp. As recent reports of rushed safety-testing from OpenAI show, if companies are not made to stick to their promises, they will ignore them — especially in the face of intense competitive pressures. And, given the pace of AI progress, it seems prudent to establish a framework for AI regulation now: if we wait until one of the tripwires is crossed, and legislation takes another 24 months from then, we may end up too late. But it seems premature to enshrine particular if-then commitments in law: we simply don’t know enough about what dangerous capabilities to look for, how to evaluate models for them, and how to mitigate them. Offering guidance, rather than mandating specifics, seems a sensible step forward given the significant degree of uncertainty at play.
If-then commitments are not flawless, especially in their current form. We need to get much better at designing dangerous capability evaluations. Threat modelling efforts also need to ramp up, so we can better predict what specific capabilities we should worry about in the first place. This will not happen overnight. But, if done right, if-then commitments are a neat and evidence-based way of tackling a thorny problem. They allow us to bypass debates about if or when certain capabilities will arise and jump straight to discussing what to do if they do. Rather than letting uncertainty paralyse us, they embrace that uncertainty — the perfect tool for a world where consensus is sorely lacking. Better still, they already have significant traction, thanks to the UK and Korean governments’ push to get companies to develop them.
With the recent interest in evidence-based AI policy, if-then commitments deserve more attention. Academics and policymakers must decide, fast, how future AI harms might materialise, what evidence would warrant action, and what such actions might look like, providing a framework to build if-then commitments around.
Those who are truly committed to evidence-based regulation ought to offer such concrete triggers for action. As for those who refuse to do so, choosing instead to wield the “evidence-based” buzzword as a way to avoid addressing emergent risks altogether? One would be wise to question whether their interest is actually in evidence-based regulation — or merely in forestalling any regulation at all.