A study published in the international journal BMJ in December 2024 highlights growing concerns about the reliability of artificial intelligence (AI) in medical diagnoses. While AI tools are increasingly relied upon for their speed and efficiency in analysing medical data, the research raises concerns about their long-term reliability.
The study, published on December 20, 2024 challenges the assumption that AI will soon replace human doctors. It noted that cognitive impairments seen in leading chatbots could undermine their reliability in medical diagnostics and erode patients’ trust in AI-based healthcare systems.
To assess the cognitive abilities of these AI tools, scientists tested publicly available LLM-driven chatbots, including OpenAI’s ChatGPT, Anthropic’s Sonnet, and Alphabet’s Gemini using the Montreal Cognitive Assessment (MoCA) test, typically used to assess cognitive function in older adults, was adapted to evaluate AI performance in areas like attention, memory, language, spatial reasoning, and executive function.
A score of 26 out of 30 on the MoCA is considered normal in humans. The AI models’ scores varied, with ChatGPT 4o achieving the 26-point threshold, ChatGPT 4 and Sonnet scoring 25, and Gemini 1.0 scoring significantly lower at 16. One MoCA task, requiring attention to a series of letters, was adapted for the AI models by presenting the letters in written form and asking the models to identify a specific letter.
A key observation from the study was the AI models’ difficulty with visual abstraction and executive function tasks, which are crucial for accurate medical diagnosis. The researchers caution that their findings are observational and direct comparisons between AI and human brains are not straightforward. However, the results indicate potential flaws in AI systems that could hinder their effectiveness in clinical settings.
Impact Shorts
More ShortsThe study suggested that AI should be viewed as a tool to assist but not as an option to replace human physicians. While AI can process vast amounts of data quickly, human expertise is still needed for interpretation. The study also highlights the need for regular updates and retraining of AI models to maintain accuracy as well as the potential for periodic “cognitive check-ups” to ensure ongoing reliability.