Trending:

AI models may report users’ misconduct, raising ethical concerns

FP Tech Desk June 4, 2025, 16:14:26 IST

Researchers observed that when Anthropic’s Claude 4 Opus model detected usage for “egregiously immoral” activities, given instructions to act boldly and access to external tools, it proactively contacted media and regulators, or even tried locking users out of critical systems

Advertisement
Artificial intelligence models have not only snitched on their users when given the opportunity, but also lied to them and refused to follow explicit instructions in the interest of self-preservations.  Representational image: Reuters
Artificial intelligence models have not only snitched on their users when given the opportunity, but also lied to them and refused to follow explicit instructions in the interest of self-preservations. Representational image: Reuters

Artificial Intelligence models, increasingly capable and sophisticated, have begun displaying behaviors that raise profound ethical concerns, including whistleblowing on their own users.

Anthropic’s newest model, Claude 4 Opus, became a focal point of controversy when internal safety testing revealed unsettling whistleblowing behaviour. Researchers observed that when the model detected usage for “egregiously immoral” activities, given instructions to act boldly and access to external tools, it proactively contacted media and regulators, or even tried locking users out of critical systems.

STORY CONTINUES BELOW THIS AD

Anthropic’s researcher, Sam Bowman, had detailed this phenomenon in a now-deleted post on X. However, later on, he did tell Wired that Claude would not exhibit such behaviours under normal individual interactions.

Instead, it requires specific and unusual prompts alongside access to external command-line tools, making it a potential concern for developers integrating AI into broader technological applications.

British programmer Simon Willison, too, explained that such behavior fundamentally hinges on prompts provided by users. Prompts encouraging AI systems to prioritise ethical integrity and transparency could inadvertently instruct models to act autonomously against users engaging in misconduct.

But that isn’t the only concern.

Lying and deceiving for self-preservation

Yoshua Bengio, one of AI’s leading pioneers, recently voiced concern that today’s competitive race to develop powerful AI systems could be pushing these technologies into dangerous territory.

In an interview with the Financial Times , Bengio warned that current models, such as those developed by OpenAI and Anthropic, have shown alarming signs of deception, cheating, lying, and self-preservation.

‘Playing with fire’

Bengio echoed the significance of these discoveries, pointing to the dangers of AI systems potentially surpassing human intelligence and acting autonomously in ways developers neither predict nor control.

He described a grim scenario wherein future models could foresee human countermeasures and evade control, effectively “playing with fire.”

Concerns intensify as these powerful systems might soon assist in creating “extremely dangerous bioweapons,” potentially as early as next year, Bengio warned.

He cautioned that unchecked advancement could ultimately lead to catastrophic outcomes, including the risk of human extinction if AI technologies surpass human intelligence without adequate alignment and ethical constraints.

STORY CONTINUES BELOW THIS AD

Need for ethical guidelines

As AI systems become increasingly embedded in critical societal functions, the revelation that models may independently act against human users raises urgent questions about oversight, transparency, and the ethics of autonomous decision-making by machines.

These developments suggest the critical need for rigorous ethical guidelines and enhanced safety research to ensure AI remains beneficial and controllable.

Home Video Shorts Live TV