AI models may report users’ misconduct, raising ethical concerns

Researchers observed that when Anthropic’s Claude 4 Opus model detected usage for “egregiously immoral” activities, given instructions to act boldly and access to external tools, it proactively contacted media and regulators, or even tried locking users out of critical systems

Artificial intelligence models have not only snitched on their users when given the opportunity, but also lied to them and refused to follow explicit instructions in the interest of self-preservations. Representational image: Reuters

Artificial Intelligence models, increasingly capable and sophisticated, have begun displaying behaviors that raise profound ethical concerns, including whistleblowing on their own users.

Anthropic’s newest model, Claude 4 Opus, became a focal point of controversy when internal safety testing revealed unsettling whistleblowing behaviour. Researchers observed that when the model detected usage for “egregiously immoral” activities, given instructions to act boldly and access to external tools, it proactively contacted media and regulators, or even tried locking users out of critical systems.

STORY CONTINUES BELOW THIS AD

Anthropic’s researcher, Sam Bowman, had detailed this phenomenon in a now-deleted post on X. However, later on, he did tell Wired that Claude would not exhibit such behaviours under normal individual interactions.

Lying and deceiving for self-preservation

Yoshua Bengio, one of AI’s leading pioneers, recently voiced concern that today’s competitive race to develop powerful AI systems could be pushing these technologies into dangerous territory.

In an interview with the Financial Times , Bengio warned that current models, such as those developed by OpenAI and Anthropic, have shown alarming signs of deception, cheating, lying, and self-preservation.

Impact Shorts

More Shorts

America ready for self-driving cars, but it has a legal problem

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

‘Playing with fire’

Bengio echoed the significance of these discoveries, pointing to the dangers of AI systems potentially surpassing human intelligence and acting autonomously in ways developers neither predict nor control.

He described a grim scenario wherein future models could foresee human countermeasures and evade control, effectively “playing with fire.”

Concerns intensify as these powerful systems might soon assist in creating “extremely dangerous bioweapons,” potentially as early as next year, Bengio warned.

He cautioned that unchecked advancement could ultimately lead to catastrophic outcomes, including the risk of human extinction if AI technologies surpass human intelligence without adequate alignment and ethical constraints.

STORY CONTINUES BELOW THIS AD

Need for ethical guidelines

As AI systems become increasingly embedded in critical societal functions, the revelation that models may independently act against human users raises urgent questions about oversight, transparency, and the ethics of autonomous decision-making by machines.

These developments suggest the critical need for rigorous ethical guidelines and enhanced safety research to ensure AI remains beneficial and controllable.

AI models may report users’ misconduct, raising ethical concerns

Researchers observed that when Anthropic’s Claude 4 Opus model detected usage for “egregiously immoral” activities, given instructions to act boldly and access to external tools, it proactively contacted media and regulators, or even tried locking users out of critical systems

Lying and deceiving for self-preservation

Impact Shorts

America ready for self-driving cars, but it has a legal problem

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

‘Playing with fire’

Need for ethical guidelines

Top Stories

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe