In a chilling turn of events, Suchir Balaji, the former OpenAI researcher who blew the whistle on alleged ethical breaches by the AI giant, was found dead in his San Francisco apartment.
As per the latest update, the Office of the Chief Medical Examiner has ruled that the 26-year-old Indian-American reportedly died by suicide and as per initial investigation, police found no foul play involved.
Balaji came into the spotlight after he raised worries over generative AI and its purported abuse of protected content, warning about the ethical complications in the industry he helped build.
Interestingly, on November 25, one day before police found Balaji’s body, a court filing named the former OpenAI employee in a copyright lawsuit brought against the startup. Later, OpenAI agreed to review Balaji’s custodial file which raised serious copyright concerns, reports TechCrunch.
But first, who was Suchir Balaji?
Suchir Balaji was among the key architects behind one of the most revolutionary technologies of the 21st century— ChatGPT.
A news story about DeepMind—a start-up that built AI capable of playing classic Atari games on its own, including Space Invaders, Pong and Breakout—captivated him.
“I thought that AI was a thing that could be used to solve unsolvable problems, like curing diseases and stopping ageing,” he told The New York Times in an interview. “I thought we could invent some kind of scientist that could help solve them.”
It was then as a computer science student at the University of California, Berkeley, Balaji began exploring the idea of developing a mathematical system called a neural network that could learn skills by analysing digital data.
He even interned with OpenAI in 2018 and later moved to officially join the AI company in 2020.
During his early days at OpenAI, Balaji worked on WebGPT, and went on to work on the pretraining team for GPT-4.
As per the outlet, Balaji put research behind developing a neural network that spent months analysing practically all the English language text on the internet.
“With a research project, you can, generally speaking, train on any data,” Balaji said. “That was the mind-set at the time.”
Then OpenAI released ChatGPT. Initially driven by a precursor to GPT-4 and later by GPT-4 itself, the chatbot made headlines and gained massive popularity. And soon enough, it became a moneymaker.
By August 2024, Balaji decided to leave OpenAI, driven by growing concerns that the technology he helped develop could ultimately do more harm than good.
“If you believe what I believe, you have to just leave the company,” he told the outlet.
What are the concerns he raised?
Balaji, a former OpenAI employee who worked with the company for over four years, left the company citing concerns about how the AI giant allegedly used copyrighted data without proper consent.
Speaking to The New York Times, he explained how systems like GPT-4 learn by making complete copies of the data they are trained on. Once that data is replicated, companies like OpenAI can train the system to either generate identical copies or entirely new outputs. The reality, he said, is that companies teach the systems to do something in between.
“The outputs aren’t exact copies of the inputs, but they are also not fundamentally novel,” Balaji said.
He warned that tools like ChatGPT and similar chatbots pose risks to content creators, businesses, and the internet services that initially produced the data used in training the AI bot.
Balaji’s warning reflected a growing number of people who were suing various AI companies, including OpenAI, arguing that they illegally used copyrighted material to train their technologies. This included computer programmers, artists, record labels, book authors and news organisations.
Further, Balaji also pointed to a larger issue: he said as AI technologies increasingly replace existing internet services, they often generate false or sometimes even completely made-up information — a phenomenon researchers refer to as “hallucinations.” The internet, he said, is changing for the worse.
His claims, however, were denied by OpenAI. “We build our AI models using publicly available data, in a manner protected by fair use and related principles, and supported by longstanding and widely accepted legal precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness,” the company said in a statement.
Push for regulation
In an X post from October, Balaji shared his concerns about the use of copyrighted material by generative AI companies.
“I initially didn’t know much about copyright, fair use, etc., but became curious after seeing all the lawsuits filed against GenAI companies,” he wrote.
As he delved deeper into the issue, Balaji reached a troubling conclusion. “Fair use seems like a pretty implausible defence for a lot of generative AI products, for the basic reason that they can create substitutes that compete with the data they’re trained on,” he added.
I recently participated in a NYT story about fair use and generative AI, and why I'm skeptical "fair use" would be a plausible defense for a lot of generative AI products. I also wrote a blog post (https://t.co/xhiVyCk2Vk) about the nitty-gritty details of fair use and why I…
— Suchir Balaji (@suchirbalaji) October 23, 2024
While many former OpenAI employees have raised concerns about the startup’s safety culture, Balaji stood out as one of the few to directly challenge the data used to train its models.
His final tweet and accompanying blog post have gained renewed attention following his death, sparking calls for stronger regulation and greater transparency in the rapidly evolving AI industry.
With input from agencies