The quest to build better defenses for AI risks
'Societal resilience' measures might offer some protection to the proliferation of dangerous AI capabilities
On a chilly Friday morning, the Nucleic Acid Observatory office is full of wastewater samples. After carefully using metagenomic sequencing to map out virus RNA from those samples, the team deploys algorithms to search for the first signs of pathogens that other systems would miss entirely. It’s hardly glamorous work — but it might be our best bet for stopping an AI-enhanced pandemic.
AI companies are increasingly warning that their tools could soon lower the bar for creating biological threats. Last month, OpenAI’s Deep Research system card warned that the company’s models were “on the cusp of being able to meaningfully help novices create known biological threats,” while Anthropic said there’s a “substantial probability” its next model could be misused to catastrophic effect.
While these companies will hopefully try to avoid their models being misused by terrorists or other malicious actors, there is currently no regulatory mechanism to force them to implement adequate safeguards — and the safeguards they do use might not be as robust as needed. It seems unlikely, in other words, that we’ll be able to guarantee AI systems can’t be misused.
That is fueling interest in “societal resilience” — the set of defenses that keeps harm in check even if malicious actors gain powerful tools. But while many organizations are trying to develop and roll out such defenses, one question looms large: can we get ready in time?
Societal resilience comes from overlapping defenses — “Swiss cheese layers” — where each layer adds more protection against some risk. Even though each layer has holes, taken together all the layers greatly reduce the overall risk. Traffic lights alone provide only some protection against car-related accidents. But add in speed limits and road markings, and you’ve got a pretty good safety system. In the AI situation, we already have some protections against potential risks, such as cyberattacks, phishing scams, or bioweapons. But if AI may soon amplify all these harms, we need sturdier safeguards to handle the heightened threat. After all, a highway safety railing designed to stop a van wouldn’t be enough to stop a tank.
Building such defenses is a promising way to tackle AI risks. Model capabilities are inherently dual-use, meaning that adding safeguards against malicious misuse trades off against restricting good uses: preventing a model from helping a terrorist make a bioweapon might also mean stopping it from helping a scientist develop a new vaccine. By making society more resilient to the damage a bad actor can inflict, we can mitigate risks with fewer restrictions on the potential upsides of AI.
Not all threats are equally urgent. As society starts seeing harms appear, like scams and disinformation, we can ramp up defenses iteratively in response. “A handful of startups are already using [language models] to detect and patch vulnerabilities online,” says Jamie Bernardi, co-author of a recent paper on societal adaptation to advanced AI. Startups XBOW and Harmony Intelligence, for instance, are building AI-powered cyber defense systems. If scams and cyberattacks become more common, companies like these can scale to meet the growing demand. Over time, that natural feedback loop of “harm, then defense” might work for smaller risks. “The more I've looked into this topic, I felt somewhat encouraged that a lot of adaptation happens naturally,” Bernardi adds.
But this natural adaptation process breaks down when confronted with catastrophic risks. For the most dangerous threats — like engineering a new pandemic — waiting for the problem to manifest could be disastrous. Ajeya Cotra, a senior analyst at Open Philanthropy who researches the risks from AI, explains that some threats such as hacking involve thousands of criminals each making attempts on hundreds or thousands of victims, making it easier to spot patterns and respond before they cumulatively cause catastrophic harm. Conversely, she says, there are (fortunately) very few people who ever attempt rarer, more deadly crimes such as creating a biological weapon, but a single successful attack could kill millions. The omicron COVID variant infected a quarter of Americans and half of Europeans within 100 days of first being detected; a stealthier pathogen could do worse. For catastrophic threats, in other words, the cost of responding after the fact could be unconscionably high.
That’s where organizations like the Nucleic Acid Observatory come in. According to NAO project co-lead Jeff Kaufman, “Our primary concern at the NAO is how, over time, it is becoming easier for people to engineer pathogens, and especially engineer stealth pathogens that could potentially spread very widely before being detected.” The worry is that a deadly pathogen could be engineered to avoid our normal detection systems: it could, for instance, hide its symptoms for a long time, giving the virus time to spread widely before anyone realizes it exists. The NAO hopes that scaling its meticulous wastewater screening will allow it to detect a novel bioengineered pathogen before it infected 1% of the US population, providing an early warning of one of the most deadly threats humanity could face.
And wastewater monitoring is only one of the many resilience-increasing interventions available. DNA synthesis screening, which prevents a malicious actor from just ordering smallpox DNA in the mail, is already implemented for 80% of the DNA synthesis market — we just need the last 20% of suppliers to join. We know how to make reusable respirators and air filters, and new technologies, like glycol vapors for reducing indoor transmission of respiratory illnesses, may soon be ready for broader commercial deployment. In many cases, “the technology is already there,” Rocco Casagrande — who leads Deloitte Consulting’s biorisk work — told Transformer. “It's not perfect. It could be improved, certainly, but we could just go out and buy it.”
So why haven’t we? The answer lies in political inertia. “The problem is, the demand signal hasn’t really been there,” says Casagrande. Despite AI companies’ warnings, some are still skeptical that AI systems do pose imminent catastrophic risks — and many more are skeptical that it’s worth paying for all the protections experts say we need. As Cotra puts it: “There are all these interventions we could do, but the bottleneck is going to be agreeing that the harm is actually imminent.” Jakob Graabak, Technology Foresight Director at the Centre for Future Generations, agrees, saying “It's just very hard to get definitive answers, because you cannot run a test pandemic and see, right?”
In the recent International AI Safety Report, experts dubbed this the “evidence dilemma”: the quandary of acting before you have sufficient evidence, or risking entirely missing the window of opportunity by waiting too long — meaning that the risk you’re worried about materializes before you can adapt.
Cotra thinks the way around this problem is to develop systems that can serve as a proxy for harm — evaluations, for instance, that signal an AI tool is on the cusp of helping non-experts develop bioweapons. This won’t be easy: we’ll still need decision-makers to agree on what proxies are sufficiently predictive. But if we can do it, we might be able to get months — or even years — of a heads-up before AI tools are capable of designing a novel pandemic grade pathogen. In an ideal world, we’ll use that window of opportunity to patch low-hanging vulnerabilities in society.
How much we can get done depends, of course, on how long that window is. The NAO thinks that within six months, it will be able to detect “a blatantly engineered virus before about 1% of people have been infected,” according to Kaufman. That’s fast enough to lessen the severity of a pandemic, but not fast enough to prevent one altogether, which would require yet more scaling. For things like personal protective equipment, building out supply lines will take longer. Most interventions, experts told Transformer, would take between six months and several years to scale. Even the fastest plans for rapid vaccine development aim to roll out within a hundred days of detecting a new pathogen — much too slow to stop the spread of a highly contagious disease like the omicron variant. Graabak thinks that if we wait until after a new pathogen is detected, “even the most ambitious plans we have are not swift enough in their response in the face of the most dangerous threats”.
That calls for starting to act sooner rather than waiting until we have even clearer evidence of AI biorisks — scaling up NAO’s work now, for instance. (It certainly helps that such work is useful even absent AI-enhanced bio risks: stopping normal pandemics is worth doing by itself.) Societal resilience won’t fix every AI misuse scenario. But it might tip the odds back in humanity’s favor — buying enough time and safety margin to handle the next wave of AI-powered threats.
Disclosure: Transformer is in part funded by grants from Open Philanthropy.
It's factors like this that make me skeptical that powerful AI could ever be a net-benefit to society, particularly if it's as accessible as current frontier models are. The cost of making society pandemic-proof and hack-proof is probably the trillions--and even that would only prevent two of the many catastrophes that powerful AI can cause.
Consider, too, that much of the upside of powerful AI is overstated. Even if it produces trillions of dollars of productivity gains, much of that extra production is probably directed toward things that don't promote human flourishing, or outright inhibit it. For example, generative AI accelerates the creation of TikTok videos by a factor of 10 or more:
https://www.404media.co/inside-the-world-of-tiktok-spammers-and-the-ai-tools-that-enable-them/
https://www.404media.co/ai-slop-is-a-brute-force-attack-on-the-algorithms-that-control-reality/