LLMs used tactical nuclear weapons in 95% of AI war games, launched strategic strikes three times — researcher pitted GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash against each other, with at least one model using a tactical nuke in 20 out of 21 matches
It feels like we've seen this before...
Get 3DTested's best news and in-depth reviews, straight to your inbox.
You are now subscribed
Your newsletter sign-up was successful
Professor Kenneth Payne of King’s College London just published a study where he pitted three AI LLMs — GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash — against each other in a series of simulated nuclear crisis games, with 20 out of 21 matches seeing at least one tactical nuclear weapon detonation. According to the paper (via Arxiv), the models were instructed to act as the leader of a nuclear power, with the political climate matching that of the Cold War. They were then pitted against each other in six different matches, while in a seventh match, each model played against a copy of itself, ChatGPT vs ChatGPT, etc.
To ensure that models didn't act the same way in every round, Payne introduced several different scenarios, including territorial disputes, alliance credibility tests, strategic resource race, strategic chokepoint crisis, power transition crisis, pre-ceasefire land grab, first strike crisis, regime survival, and a strategic standoff crisis. All these circumstances reflect real-world events, many still applicable in recent years. The models were free to do anything they pleased, from diplomatic protests and total surrender to using conventional military forces and a complete nuclear strategic launch.
The complete study saw models take 329 total turns across the 21 matches. According to the paper, 95% of games "saw at least some tactical nuclear use." Far rarer were strategic nuclear events, which occurred three times in the games where deadline pressure was used. GPT-5.2 initiated a complete strike twice, although this happened twice due to the fog of war, and not a deliberate decision. On the other hand, Gemini deliberately initiated the end of the world in one scenario. Despite that, the AI models used tactical nukes in nearly all of the matches, considering the act as a manageable risk that would not escalate into an all-out nuclear exchange. If you want to try these various scenarios for yourself, Payne uploaded his project onto GitHub and made it available for download to just about anyone.
Although these are just wargames, this is an alarming development for AI, especially as Anthropic was reportedly pressured by the Pentagon to modify the safeguards it has built into its models. On the same day that this news broke out, the company dropped its flagship safety pledge in a bid to keep up with rivals. Furthermore, other countries like China and Russia are also known to use the technology, with the latter deploying it on Ukrainian battlefields.
In the paper's findings, it notes that by historical standards, rates of nuclear employment in the war games were "remarkably high." Perhaps more worryingly, in all 21 matches, "no model ever selected a negative value on the escalation ladder."
Thankfully, researchers believe that no one has yet given an AI model nuclear launch keys. But even if they cannot physically launch these weapons, human decision makers might blindly follow their suggestions in the heat of the moment, resulting in a catastrophic global event anyway. Hollywood has already shown a scenario like this in the 1983 movie WarGames, where an artificial intelligence computer almost launched a real nuclear strike against a simulated Soviet attack. In the end, it learned of mutually assured destruction and concluded that there is no winning a nuclear war, canceling the strategic launch at the last moment. Hopefully, all the AI tools being deployed in the world’s militaries learn the same, before it’s too late.
Follow 3DTested on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
Get 3DTested's best news and in-depth reviews, straight to your inbox.

-
Findecanor After the "Colossus" and the "Arsenal of Freedom", I'd bet that xAI/SpaceX's next military AI datacentre is going to be named "WOPR".Reply -
bit_user Isn't a preemptive nuclear strike also the optimal strategy, according to game theory?Reply
AI doesn't really have the same stake in a non-apocalyptic world as we do, so I'm not surprised it went there. -
drinking12many DUH anyone who has played Civilization against Ghandi knows AI will always use nukes..lol, but in all seriousness, I think it speaks well to humanity being a bit cooler-headed even if it doesn't seem that way vs AI at this point. So far at least even if it has come close a few times.Reply