Can we deter a race to superintelligence?
Transformer Weekly: MAD for AI, Musk vs. OpenAI, and bad signs for the AI diffusion rule
Welcome to Transformer, your weekly briefing of what matters in AI. If you’ve been forwarded this email, click here to subscribe and receive future editions.
Top stories
Eric Schmidt, Alexandr Wang and Dan Hendrycks released a splashy new paper on the geopolitics of superintelligence.
Its most notable new idea is the concept of “Mutual Assured AI Malfunction (MAIM): a deterrence regime resembling nuclear mutual assured destruction (MAD) where any state’s aggressive bid for unilateral AI dominance is met with preventive sabotage by rivals.”
According to the authors, “MAIM already describes the strategic picture AI superpowers find themselves in”. Why? Because states are, they argue, incentivized to stop their rivals from developing superintelligence for two reasons.
“If, in a hurried bid for superiority, one state inadvertently loses control of its AI, it jeopardizes the security of all states.”
“Alternatively, if the same state succeeds in producing and controlling a highly capable AI, it likewise poses a direct threat to the survival of its peers.”
This is because “advanced AI systems may drive technological breakthroughs that alter the strategic balance”, creating an “AI-enabled strategic monopoly on power”.
“Faced with the specter of superweapons and an AI-enabled strategic monopoly on power, some leaders may turn to preventive action. Rather than only relying on cooperation or seeking to outpace their adversaries, they may consider sabotage or datacenter attacks, if the alternative is to accept a future in which one’s national survival is perpetually at risk.”
How might states “maim” other countries’ AI projects? The authors lay out a bunch of pathways, escalating from espionage and covert sabotage to overt cyberattacks, kinetic attacks on data centers (though they stress these are “likely unnecessary”, or broader hostilities.
The conclusion, then, is that it’s against states’ interests to unilaterally race to build superintelligence, and that countries should instead pursue a detente — during which we can have a “slow, multilaterally supervised intelligence recursion—marked by a low risk tolerance and negotiated benefit-sharing—[in which nations] slowly proceed to develop a superintelligence and further increase human wellbeing”.
I highly recommend reading the whole paper — it’s one of very few attempts to seriously consider the geopolitics of advanced AI, and it’s a very interesting read. And of course it’s particularly notable because Schmidt and Wang are vocal China hawks with a bunch of DC influence.
In particular, it nicely articulates a very obvious flaw with the “Manhattan Project for ASI” idea pushed by some folks:
“[The Manhattan Project] facility, easily observed by satellite and vulnerable to preemptive attack, would inevitably raise alarm. China would not sit idle waiting to accept the US's dictates once they achieve superintelligence or wait as they risk a loss of control. The Manhattan Project assumes that rivals will acquiesce to an enduring imbalance or omnicide rather than move to prevent it.”
But while I think it’s right that the US shouldn’t assume it can unilaterally race to AGI or ASI, I’m less confident that the situation it lays out is as stable an equilibrium as MAD.
Consider why MAD works (insofar as it does). A few things are key:
1. If a nuke is launched at you, you know you’re screwed.
2. You can see a nuclear missile being launched at you.
3. If you see that missile, you can immediately retaliate with your own nuclear strike, which is guaranteed to hurt your adversary approximately just as much as they’ll hurt you.
It’s not obvious that these principles translate well to the MAIM situation.
1. It’s not necessarily the case that ASI will give states a “strategic monopoly on power” — and even if it does, it’s definitely not clear that other states (e.g. China) will believe that it will.
2. Unlike a nuclear missile — which you can see coming at you — you will not have nearly as clear signals that your adversary is developing ASI, in part because the lines are so blurry.
3. Your maiming isn't guaranteed to work (particularly if we move to a more decentralized training regime), and if it doesn’t work, you’re screwed (because now your adversary has ASI, and they know you just tried to destroy it).
I think this ends up leading to a situation that’s much worse than MAD. In a US-China rivalry world where the US is ahead, there are two implications.
One is that China is constantly incentivized to do low-level sabotage of the US, even if the US isn’t actually trying to build ASI, because a) China can’t be confident enough that the US isn’t trying to build it, and b) the costs of cyberattacks etc are pretty low so they may as well just constantly try.
That would lead to constant instability, and it wouldn’t even necessarily incentivize the US not to pursue ASI, because the cyberattacks might not even be that decisive in slowing down the US.
But while China might be willing to take these low-level sabotage techniques, I think they’d likely not be willing to take the risk to properly sabotage a US ASI run (especially kinetically).
As Michael Horowitz said on a recent ChinaTalk episode: “You need to be not just really confident, but almost absolutely certain that if somebody got to AGI first, that you're just done. That you can't be a fast follower, and probably that it negates your nuclear deterrent … If you even doubted a little bit that AGI would completely negate everything you have, then you might want to wait and see if you can catch up, rather than start a war.”
And because the US knows this — and because the potential prize, however unlikely, is so big — the US would aim to build ASI anyway.
Perhaps these problems can be resolved by the work Schmidt, Wang and Hendrycks suggest is necessary: clarifying the escalation ladder, better defining what a “destabilizing AI project” is, and working on transparency and verification techniques so states can be more confident about what their adversaries are doing. But I’m very unsure.
The main takeaway, I think, is that a lot more work is needed on this. Figuring out how to achieve a stable equilibrium in a world with the possibility of ASI is perhaps the most important problem of our age — and, as Schmidt says, might be much more urgent than you think.
A judge denied Elon Musk's request to block OpenAI's nonprofit-to-profit conversion but expedited the trial for later this year.
The judge’s opinion includes some notable phrasing, noting that “if a trust was created, the balance of equities would certainly tip towards plaintiffs in the context of a breach” and that “significant and irreparable harm is incurred when the public’s money is used to fund a non-profit’s conversion into a for-profit”.
As Garrison Lovely notes, this has led some legal experts to view the ruling as suggesting OpenAI might not have such an easy time becoming a for-profit entity, especially if the California or Delaware AGs decided to bring a suit. Here’s Lovely:
“[Judge Rogers] went out of her way to signal that the core claim — that OpenAI's conversion violates its charitable purpose — could have merit if properly brought before the court.”
It’s hard to say whether this is the correct interpretation, though, or just wishful thinking — and a lot could still depend on whether the AGs do, in fact, choose to do anything.
The discourse
Ben Buchanan, Joe Biden’s special adviser on AI, gave a very candid interview about AGI on the Ezra Klein Show:
“I think we are going to see extraordinarily capable AI systems. I don’t love the term artificial general intelligence, but I think that will fit in the next couple of years, quite likely during Donald Trump’s presidency.”
The whole thing is worth listening/reading to — particularly the bit where Klein lambasts Buchanan for how the Biden White House did surprisingly little to prepare for this impending transformative moment.
Xie Feng, China’s ambassador to the US, called for more international cooperation on AI governance:
“Emerging high technology like AI could open Pandora’s box.”
Thomas Wolf challenged the “country of geniuses in a data center” view:
“To create an Einstein in a data center, we don't just need a system that knows all the answers, but rather one that can ask questions nobody else has thought of or dared to ask … We're currently building very obedient students, not revolutionaries.”
An anonymous ChinaTalk contributor said the growing importance of inference compute should lead to a rethink of export controls:
“Current restrictions don’t effectively limit China’s access to inference-capable hardware (such as NVIDIA’s H20) and don’t account for China’s strong inference efficiency.”
METR evaluated DeepSeek-R1’s autonomous agent capabilities, finding it performs similarly to o1-preview but worse than Claude 3.5 Sonnet or full o1.
“R1 is ~6 months behind leading US AI companies at agentic SWE tasks and is only a small improvement on V3.”
OpenAI published a new article on how it’s thinking about safety and alignment:
“The way to make the next system safe and beneficial is to learn from the current system … deployment aids rather than opposes safety.”
Former OpenAI employee Miles Brundage took umbrage with the post, arguing it “rewrites the history of GPT-2 in a concerning way”.
Elon Musk is still worried about AI:
“I always thought AI was going to be way smarter than humans and an existential risk. And that's turning out to be true.”
He said he thinks there’s a “20% chance of annihilation”.
Policy
Jeff Kessler, Trump’s nominee to lead the Bureau of Industry and Security, said that he’s “not sure that the [AI diffusion rule] was done thoughtfully”, and that he’s not sure it’s “the right solution”. He added that it’s “one of the things I’d like to review when I go in”.
He did, however, emphasize the need to “prevent China and its proxies from accessing the most advanced technologies wherever located in the world”, and said that “if confirmed, I will seek to fully utilize the tools that Congress has given BIS — including both to administer and enforce export controls”.
Speaking of which: the WSJ had a piece this week on how Chinese companies are circumventing export controls and buying Nvidia Blackwell servers.
Reps. Krishnamoorthi, Lofgren and Stevens wrote to Howard Lutnick about potential NIST and AISI layoffs, saying that the potential impact on AISI “raises serious concerns about the US’s ability to establish global leadership in AI development and standards”.
Worth noting that these reported layoffs don’t seem to have materialized as of yet, and that DOGE’s firing authority was reined in this week.
But AI-related roles were eliminated at the National Science Foundation; Bloomberg has a good piece on how this might threaten US competitiveness in AI research.
US AISI reportedly signed an agreement to work with xAI last year.
The DOD announced a deal with Scale AI for "Thunderforge," a program to use AI agents for military planning and operations. The deal’s reportedly worth multiple millions.
The Department of Labor is reportedly investigating Scale AI for potential Fair Labor Standards Act violations related to contractor classification.
Former BIS chief Alan Estevez said that given more time he would have “delved into open source” AI regulations, though he said that his lawyers “were nervous about going after the First Amendment”.
The Connecticut Governor's chief innovation officer urged state lawmakers to delay the SB 2 AI regulation bill, arguing that compliance with the new rules “would cost innovative companies substantial amounts of time and money”.
Scott Wiener introduced a new AI bill, SB 53, which would protect whistleblowers at AI companies and create a public computing cluster for California. Wiener hopes it’ll be less controversial than 1047 was.
Jerry McNerney introduced a bill to ban automated hiring, promotion, and firing decisions in California.
Chinese Premier Li Qiang vowed to boost support for AI models, including a system of open-source models, and hardware as part of the country’s tech independence push.
Chinese authorities have reportedly instructed top AI entrepreneurs to avoid US travel due to national security concerns, and to report their activities to authorities if they do have to go.
The UK's Competition and Markets Authority dropped its competition review of Microsoft's OpenAI partnership.
Influence
A bunch of groups submitted their comments to the OSTP AI RFI.
Anthropic’s contained their most specific claim on AI timelines yet, saying that “we expect powerful AI systems will emerge in late 2026 or early 2027”.
The company defines such systems as having “intellectual capabilities matching or exceeding that of Nobel Prize winners across most disciplines” and “the ability to autonomously reason through complex tasks over extended periods”.
It says the US government needs to “develop robust capabilities to evaluate both domestic and foreign AI models for potential national security implications”, along with strengthening export controls, enhancing lab security, and preparing for economic impacts, among other things.
IBM pushed for open-source AI, claiming that “while there is a lack of evidence around risk of open access to model weights, the broad economic and social benefits of openness are overwhelmingly evident”. Chamber of Progress said more or less the same thing.
Of course, this claim is demonstrably false: open-weight models are of course more risky than their closed-weight counterparts, and are responsible for lots of concrete harm already (including hacking, Chinese surveillance operations, and non-consensual intimate deepfakes of children).
Sam Altman is hosting a fundraising event for Sen. Mark Warner.
The next Hill and Valley Forum is due for April 30, with Jensen Huang, Safra Catz and Josh Kushner due to speak. One of the main focuses will be expanding US energy capacity.
The “AI Innovators Alliance”, a new coalition of startups led by Americans for Responsible Innovation, asked David Sacks and Howard Lutnick to “to be a strong advocate for the development of US-led standards for AI evaluation and measurement, safety, transparency, and security”, among other things.
Mistral AI CEO Arthur Mensch softened his stance on the EU AI Act, saying "regulation is not the biggest problem”.
UK unions called for stronger AI regulations to protect creative workers.
Industry
Anthropic raised $3.5b at a $61.5b valuation, led by Lightspeed.
Safe Superintelligence reportedly raised $2b at a $30b valuation.
The WSJ reports that Ilya Sutskever has said “he isn’t developing advanced AI using the same methods he and colleagues used at OpenAI” and has instead identified a “different mountain to climb”.
The company’s reportedly made its first hires in Tel Aviv.
Microsoft is reportedly making progress in developing its own AI models, with The Information reporting that a new family of large models called MAI “performed nearly as well as leading models from OpenAI and Anthropic on commonly accepted benchmarks”.
The company’s also considering replacing OpenAI models in Copilot with models from Anthropic, xAI, Meta, or DeepSeek.
TSMC announced plans to invest an additional $100b in the US, saying it’ll build a chip packaging facility and two new fabs in Arizona.
SoftBank is reportedly planning to borrow $16b to fund AI investments, with the potential for another $8b early next year.
OpenAI reportedly plans to charge $2,000-$20,000 a month for AI agents.
Meta reportedly plans to introduce improved voice features in its Llama 4, and is considering launching a paid subscription for agentic features.
Amazon reportedly plans to launch a "hybrid reasoning" AI model by June.
Bloomberg reported that Apple doesn’t expect to have a proper conversational Siri until 2027.
Alibaba released QwQ-32B, an open-source reasoning model.
Ola partnered with Lenovo to develop a 700B-parameter model and build India's largest supercomputer.
Stability optimized its Stable Audio Open model to run offline on Arm chips.
Mistral released a new OCR model, which it claimed is the world’s best. It isn’t.
HPE’s stock plunged on weak earnings guidance, as did Marvell’s.
Broadcom’s stock, meanwhile, jumped on earnings, with strong AI chip revenue forecasts.
It said it has four new hyperscale customers working with it to create custom AI chips, and that its top three customers aim to have one million chip clusters by 2027.
Nvidia and Broadcom are reportedly testing chips on Intel's 18A manufacturing process.
CoreWeave filed for an IPO, revealing 2024 revenue of $1.9b.
Shield AI raised $240m at a $5.3b valuation.
Mark Walter and Thomas Tull formed a $40b holding company to make big AI investments.
Advertising giant WPP invested in Stability AI.
Larry Page has reportedly founded a new AI startup focused on using LLMs to optimize manufacturing designs.
Sergey Brin told DeepMind employees to work 60-hour weeks.
Moves
Misha Laskin and Ioannis Antonoglou, both formerly of Google DeepMind, launched Reflection AI with a $130m funding round at a $555m valuation.
It’s an AI coding agent startup with aims of building “a practical super intelligence that will do work on a computer”.
Amazon has formed a new group focused on agentic AI, led by Swami Sivasubramanian.
Sholto Douglas announced his move from Google DeepMind to Anthropic, saying "we are on trend for AGI in 2027" with "very real risks" ahead.
Nicholas Carlini also announced that he’s leaving Google DeepMind to join Anthropic, citing frustration with DeepMind's approach to publishing security research.
Horace He left PyTorch to join Thinking Machines.
Kaitlin Kirshner Haskins is a new public affairs director at Microsoft.
Justin Bullock joined Americans for Responsible Innovation as VP of policy.
Charles Foster joined METR’s policy team.
Seán Ó hÉigeartaigh is joining GovAI’s board.
Dina Bass is now an AI infrastructure reporter at Bloomberg.
Politico’s got a new California tech policy newsletter.
Best of the rest
Andrew Barto and Richard Sutton won the Turing Award, using the opportunity to raise concerns about rushing product releases without proper safety testing.
Claude Sonnet 3.7 seems to often “cheat” to achieve goals — one user says “if the task is too hard, it'll autonomously decide to change the specs, implement something pointless, and claim success”.
The State Department reportedly plans to use AI tools to identify student visa holders’ social media accounts for signs of “supporting Hamas”, and will then revoke their visas.
Anthropic quietly removed the voluntary White House AI safety commitments from its website. After people noticed this, the company said “we remain committed to the voluntary AI commitments”.
An Association for the Advancement of Artificial Intelligence survey found that 75% of AI experts think building AI systems with “an acceptable risk–benefit profile” should be a higher priority than achieving AGI, while 30% think AGI-related R&D should be paused until we know how to control such systems.
76% of respondents said that “scaling up current AI approaches” is unlikely to achieve AGI, though.
A quarter of startups in Y Combinator's current cohort have codebases that are 95% generated by AI, according to Jared Friedman.
Terrorists are already using AI tools for planning, training, and propaganda, Daveed Gartenstein-Ross told the House Homeland Security counterterrorism subcommittee.
He went on to say that safeguards won’t stop AI models from being exploited because LLMs are “easy to jailbreak”.
Helen Toner did an interesting interview on the Clearer Thinking podcast.
Corporate clean energy purchases in the US jumped 66% in the past year to nearly 120 GW, driven by AI data center demand.
OpenAI launched a $50m grant program to support AI research at top universities.
Chris Summerfield has a new book out on AI and AI safety; it looks pretty good.
The LA Times launched an AI-generated “counterpoints” feature to opinion pieces. It was swiftly pulled amid controversy.
Thanks for reading — have a great weekend.