It’s been five years since Jack Clark and his colleagues at OpenAI launched GPT-2. Reflecting on the occasion, Clark (who now runs Anthropic’s policy department), recently put out an interesting article on the launch and the warnings that accompanied it.
With the benefit of hindsight, Clark admits that many of those warnings haven't aged well, noting that the risks they foresaw were “further away than we thought and, I think at least so far, less significant than we thought”. Clark notes that we have a “poor record” of figuring out when bad things might happen, the scale at which they'll happen, and how severe those bad things might be.
As someone who's spent the last couple years embedded in AI safety circles, I agree. While most people in this space are pretty good at expressing uncertainty, there have been some notable misfires. Worries that Gemini and GPT-4 would be existential threats, for instance, have aged like milk. Even concerns about the risks of open-source models like Llama 2 seem overblown in retrospect. A little more humility would go a long way: not only is “hugely uncertain” the right approach, it’s also the one that will age best (you can only cry wolf so many times before everyone stops listening).
Building on this call for humility, Clark argues that we should rethink our approach to AI regulation. “I’ve come to believe that in policy ‘a little goes a long way,’” he writes, suggesting that it's better to advocate for a few ideas that are “robustly good in all futures” than to bet confidently on a suite of policies tailored to a specific, highly uncertain future risk model. The latter approach particularly worries Clark because of the risk of empowering regulators. As he notes, “history shows that once we assign power to governments, they're loath to subsequently give that power back to the people”. Regulating too aggressively or prematurely could be regrettable — especially if our risk assessments turn out to be misguided.
I share many of Clark’s concerns here, particularly around the risks of governmental overreach. My perspective isn’t the typical libertarian one, but rather that of a brown kid who grew up in post-9/11 Britain. In the wake of those attacks, governments worldwide were granted sweeping new powers in the name of preventing future terror threats. Unsurprisingly, they often misused those powers, subjecting innocent people to unwarranted surveillance and making air travel a nightmare for people like me, more than two decades on from 9/11. Richard Dawkins overstated the case when, after having his “little jar of honey” confiscated at airport security, he declared that “Bin Laden has won” — but he wasn't entirely off-base.
Clark is right that we should set a very high bar for granting governments new powers, especially when the risks are so uncertain and the proposed powers so far-reaching. The spectre of a fascist president with unilateral control over all large compute clusters is a chilling one. While such a “kill switch” might help mitigate some AI risks, it would also create a host of new dangers — dangers that, as Clark points out, often seem to be under-appreciated in today’s AI safety community.
But I worry that Clark may have overcorrected. The actual policy proposals he endorses, like more rigorous evaluations of AI systems and greater government visibility into their development, are relatively modest. They’re all well and good, but I fear they’re insufficient, particularly if the pace of AI progress accelerates.
One way to balance the imperatives of avoiding unnecessary governmental overreach and proactively addressing AI risks is through “conditional regulation”. This would essentially be a legally-binding version of the responsible scaling policies that companies like Anthropic have already put forward. Under this framework, companies would be forced to continuously test their models for specific dangerous capabilities, with stringent safety standards automatically kicking in only when those capabilities are detected. If your model can’t design a bioweapon or autonomously replicate itself, consider yourself unregulated. But if and when it can, that’s when strict rules come into force.
This approach is, in Clark’s words, “robustly good in all futures”, because it plays out differently in different futures. In a world where catastrophic risks from AI never materialise, companies won’t be bound by burdensome rules. In a world where they appear in ten years, we won’t have prematurely forestalled innovation. And in a world where they appear tomorrow, we’ll have measures in place to contain them.
Companies seem to like this approach, too. Anthropic, OpenAI and Google DeepMind have all put out policies promising to do something like this, and other companies (including Meta) have pledged to put out their own frameworks soon. But right now these are just voluntary pledges, which companies can break at any time. We’re relying on companies to keep their commitments, and when profit incentives push against society’s interests, that’s a dangerous place to be. OpenAI, in particular, is run by a man whose own board tried to fire him for being a liar.
Trusting these companies to do the right thing is a fool’s errand: the only way to ensure they behave appropriately is to force them to.
Such regulation wouldn't be costless. It would necessitate rigorous testing by companies and verification by independent third parties. There’s also the ever-present risk that future governments could lower the risk thresholds in a bid to expand their own power. But given the profound uncertainty we face about the nature and timing of AI risks, crafting rules that embrace the uncertainty seems like the most prudent way forward.