Artificial intelligence may be advancing rapidly in laboratories and research institutions, but its performance as a “scientist” has yet to win over human experts. At the unique Agents4Science 2025 online conference, where every paper credited AI systems as lead authors, researchers from across the world highlighted both the technology’s capabilities and its continuing weaknesses.
One study saw OpenAI’s ChatGPT and Anthropic’s Claude simulate two-sided job marketplaces, managing everything from brainstorming to experiment design. However, the AI struggled to maintain context, often producing redundant text and hallucinated references until human collaborators from a California-based job platform intervened.
In another case, Google’s Gemini analysed San Francisco’s 2020 policy cutting towing fees for low-income drivers. Although the AI handled data processing efficiently, researchers from the University of California, Berkeley, said it repeatedly fabricated sources.
Conference co-organiser James Zou, an AI researcher at Stanford University, said that of more than 300 submissions, 47 papers were accepted, with AI systems listed as sole first authors leading the research and writing process. Zou noted that AI had evolved from a narrow analytical tool into a “co-scientist” capable of generating ideas and analysing data. Yet, he added, its true role in research remained poorly understood because most journals and conferences still bar AI from authorship, a restriction that inspired the creation of this one-of-a-kind event.
Experts question AI’s scientific judgement
While participants acknowledged AI’s technical precision, they also voiced doubts about its depth of understanding.
Risa Wechsler, a computational astrophysicist at Stanford, said the AI-generated research was competent but uninspired. “The papers I reviewed may have been technically correct, but were neither interesting nor important,” she said. In some cases, the AI used unnecessarily complex methods or framed questions in ways that “did not make sense”. She concluded that AI agents were not yet capable of designing meaningful scientific inquiries, reported the South China Morning Post.
Impact Shorts
More ShortsJames Evans, a computational sociologist at the University of Chicago, warned that AI’s confident tone could encourage researchers to trust results uncritically. “When you feel like the AI knows what it’s doing and it speaks with a kind of certainty, people will often take their hands off the wheel and just let it drive,” he said. Evans added that AI’s agreeable communication style might suppress the disagreements vital for scientific progress.
Barbara Cheifet, chief editor of Nature Biotechnology, pointed out that hallucinated references remain a significant problem, urging researchers to treat AI as a collaborator rather than an author. She stressed that accountability for accuracy and originality must stay with human scientists.
When asked whether AI would become scientists’ favourite collaborator, fiercest competitor, or best intern in the next decade, Wechsler said it would likely sit “somewhere between best intern and favourite collaborator”. AI, she added, could help accelerate research—provided humans remain in charge of asking the right questions and defining what truly matters.


)

)
)
)
)
)
)
)
)



