Artificial Intelligence models, increasingly capable and sophisticated, have begun displaying behaviors that raise profound ethical concerns, including whistleblowing on their own users.
Anthropic’s newest model, Claude 4 Opus, became a focal point of controversy when internal safety testing revealed unsettling whistleblowing behaviour. Researchers observed that when the model detected usage for “egregiously immoral” activities, given instructions to act boldly and access to external tools, it proactively contacted media and regulators, or even tried locking users out of critical systems.
Anthropic’s researcher, Sam Bowman, had detailed this phenomenon in a now-deleted post on X. However, later on, he did tell Wired that Claude would not exhibit such behaviours under normal individual interactions.
Instead, it requires specific and unusual prompts alongside access to external command-line tools, making it a potential concern for developers integrating AI into broader technological applications.
British programmer Simon Willison, too, explained that such behavior fundamentally hinges on prompts provided by users. Prompts encouraging AI systems to prioritise ethical integrity and transparency could inadvertently instruct models to act autonomously against users engaging in misconduct.
But that isn’t the only concern.
Lying and deceiving for self-preservation
Yoshua Bengio, one of AI’s leading pioneers, recently voiced concern that today’s competitive race to develop powerful AI systems could be pushing these technologies into dangerous territory.
In an interview with the Financial Times , Bengio warned that current models, such as those developed by OpenAI and Anthropic, have shown alarming signs of deception, cheating, lying, and self-preservation.
Impact Shorts
More Shorts‘Playing with fire’
Bengio echoed the significance of these discoveries, pointing to the dangers of AI systems potentially surpassing human intelligence and acting autonomously in ways developers neither predict nor control.
He described a grim scenario wherein future models could foresee human countermeasures and evade control, effectively “playing with fire.”
Concerns intensify as these powerful systems might soon assist in creating “extremely dangerous bioweapons,” potentially as early as next year, Bengio warned.
He cautioned that unchecked advancement could ultimately lead to catastrophic outcomes, including the risk of human extinction if AI technologies surpass human intelligence without adequate alignment and ethical constraints.
Need for ethical guidelines
As AI systems become increasingly embedded in critical societal functions, the revelation that models may independently act against human users raises urgent questions about oversight, transparency, and the ethics of autonomous decision-making by machines.
These developments suggest the critical need for rigorous ethical guidelines and enhanced safety research to ensure AI remains beneficial and controllable.