How US built new tool to stop AI from making nuclear weapons

FP Explainers • August 21, 2025, 21:01:40 IST

Whatsapp Facebook Twitter

Anthropic, an Artificial Intelligence (AI) start-up backed by Amazon and Google, has developed a new tool to stop its chatbot from being used for the nefarious purposes of building a nuclear bomb or a reactor

Subscribe Join Us

Add as a preferred source on Google

Prefer
Firstpost On
Google

How US built new tool to stop AI from making nuclear weapons

Anthropic, whose AI bot Claude is a direct competitor to OpenAI's ChatGPT, said it has been working with the US government for over a year to build in the safeguard.

Today, everyone is obsessed with Artificial Intelligence (AI).

AI is said to have the potential to change society forever, in good ways and bad. Many hope it will cure humans of disease, extend our lifespans solve climate change, and unlock the secrets of the universe.

Others fear it will cause some jobs to go away forever, leaving millions out of work and society on the brink. Others imagine a dark, dystopian future with AI ruling over humanity – perhaps in the aftermath of it ordering nuclear strikes.

STORY CONTINUES BELOW THIS AD

Now, some are taking steps to at least safeguard its AI models from being used as tools to build nuclear weapons.

But what happened? What do we know?

More from Explainers

Talking to chatbots can lead to 'AI psychosis'. Is this a growing mental health risk?

Do you have a reading list? How books are dying a slow death

Let’s take a closer look

What happened?

Anthropic, an AI start-up backed by Amazon and Google, has developed a new tool to stop its AI from being used for the nefarious means of building a nuclear bomb. Anthropic's Claude is a direct competitor to OpenAI's ChatGPT.

Anthropic said it has been working with the US government for over a year to build in the safeguard. The company said it has coordinated with the National Nuclear Security Administration (NNSA) to figure out a “classifier” that can halt “concerning” conversations — for example, how to build a nuclear reactor or bomb – on its AI system.

Anthropics said the program sprung out of its 2024 exercises with the US Department of Energy. The NNSA falls under the US Energy Department. It is tasked with making sure the United States ‘maintains a safe, secure, and reliable nuclear stockpile through the application of unparalleled science, technology, engineering, and manufacturing.’ The NNSA’s Office of Defence Programs is in charge of maintaining and modernising the country’s nuclear stockpile.

Editor’s Picks

OpenAI’s ChatGPT-5 to launch in August: What to expect 2

ChatGPT’s Ghibli-style images: Who is Hayao Miyazaki, man behind OG Studio Ghibli?

How did it do it?

The company said it was able to put together a list of gauges that can help Claude identify “potentially concerning conversations about nuclear weapons development”.

The classifier acts like a spam filter in the email and identifies real-time threats. The company has claimed that the classifier can determine with almost 95 per cent accuracy if the person carrying on the conversation with the AI bot is intending to cause harm. The company said the classifier identified 94.8 per cent of nuclear weapons queries. However, it inaccurately classified 5.2 per cent of the queries as dangerous.

STORY CONTINUES BELOW THIS AD

**The company said it was able to put together a list of gauges that can help Claude identify “potentially concerning conversations about nuclear weapons development”.**

Anthropic has said that it has already employed the classifier in some of its Claude models.

“As AI models become more capable, we need to keep a close eye on whether they can provide users with dangerous technical knowledge in ways that could threaten national security,” Anthropic has said.

The company has vowed to share what it has learnt with the Frontier Model Forum, an AI industry body it has co-founded alongside Amazon, Meta, OpenAI, Microsoft and Google , in order to help other companies build similar programmes.

Anthropic earlier in August said it would offer its Claude AI model to the US government for $1 (Rs 87), joining the ranks of AI start-ups proposing lucrative deals to win federal contracts.

This came days after OpenAI's ChatGPT, Google’s Gemini and Anthropic’s Claude were added to the US government’s list of approved AI vendors.

“America’s AI leadership requires that our government institutions have access to the most capable, secure AI tools available,” CEO Dario Amodei said.

STORY CONTINUES BELOW THIS AD

Rival OpenAI had also announced a similar offer in August, wherein ChatGPT Enterprise was made available to participating US federal agencies for $1 per agency for the next year.

With inputs from agencies

Impact Shorts

Ghaziabad woman dead, pilgrims attacked in bus… How Nepal’s Gen-Z protests turned into a living hell for Indian tourists

Prime Minister KP Sharma Oli resigned following violent protests in Nepal. An Indian woman from Ghaziabad died trying to escape a hotel fire set by protesters. Indian tourists faced attacks and disruptions, with some stranded at the Nepal-China border during the unrest.

More Impact Shorts