From Meditron To ChatGPT: Race to Build Safe Medical AI As Patients Start Trusting Machines Over Doctors
Many chatbots still misdiagnose often, even as new tools like Meditron draw global doctors to test and improve them.
Published : December 5, 2025 at 10:52 PM IST
By Surabhi Gupta
New Delhi: Around the world, millions of people are turning to chatbots not just to ask casual health questions, but to understand symptoms, interpret lab reports, and even self-diagnose conditions. As one assessment notes, “millions of people are using chatbots to try to better understand their health. And some are going further than just asking medical questions.” The surge represents a profound shift in how individuals interact with medicine, but experts caution that the accelerating adoption of medical AI brings enormous promise alongside serious risks.
At the centre of this debate is the growing evidence that “inaccurate information is a major concern; some studies have found that people without medical training obtain correct diagnoses from chatbots less than half the time.” As these tools become more accessible and powerful, questions around safety, reliability, and regulation have come to dominate global discourse.
Even technology companies acknowledge the tension. OpenAI, the maker of ChatGPT, stated that it has “extensive safeguards to protect its users’ private information.” A company representative emphasised that users can opt out of data-training settings and said the firm “tested its systems against simulated attacks.” Yet critics argue that disclaimers and defensive measures alone cannot resolve the deeper issues that arise when AI, especially non-specialised, general-purpose chatbots, is used for medical decision-making.
Reverse Innovation And The Push For Open-Source Medical AI
While debates over safety continue, some of the most consequential innovations in medical AI are emerging far from the world’s richest hospitals. Professor Mary-Anne Hartley, Director of the Laboratory for Intelligent Global Health and Humanitarian Response Technologies, based jointly at EPFL, Ariadne Labs at Harvard-Chan School of Public Health and Ashoka University, told ETV Bharat that transformational innovation frequently originates in the most resource-constrained environments.
“Many of the best medical AI tools were first developed in low-resource settings because working under tough constraints forces real innovation. These tools had to be robust, reliable, and effective in high-stakes environments, and only later did high-resource countries recognise their value. People call it ‘reverse innovation’; to me, it’s simply innovation.”
Hartley highlights a foundational challenge: “Less than 3% of PubMed research represents Africa, so we lack data from the places where these tools are needed most.” Instead of waiting years for representative datasets, her team allows clinicians to use tools like Meditron with vigilance, gather real-world data, and “continuously retrain the models. Open source is essential. It ensures transparency, prevents commercial bias, and treats medical AI as a public good.”
Meditron, built on Meta’s Llama architecture, represents a major step toward AI designed specifically for clinical use. As Hartley explains, “Meditron was built on Meta Llama 2 and trained on carefully curated, high-quality medical data sources with continual input from clinicians and experts in humanitarian response.” Within months of release, Meditron was downloaded over 30,000 times.
Hartley argues that medical foundation models “have the potential to provide life-saving advice and guidance. Yet the lowest-resource settings have the most to gain and remain the least represented.” The goal, she says, is to offer tools that provide “evidence-based care, contextually aware recommendations, and professional standards,” not generic internet-derived answers.
She added, “We’re collaborating with the Indian Council of Medical Research, Ashoka, and anyone else who wishes to join us. Our goal is to scale these initiatives and adapt them to the regions that need them most. India is incredibly diverse, so ensuring strong local ownership is essential. This isn’t our project; it belongs to the public. We’re simply helping build the structure around it, and we’re truly excited to work on this together.”
Dr Suvrankar Datta, Group Lead at the Centre for Responsible Autonomous Systems in Healthcare (CRASH Lab) at the Koita Centre for Digital Health, Ashoka University, said their research shows major gaps in how current AI systems handle Indian medical realities. “We have been extensively testing models like ChatGPT and Gemini on Indian clinical cases, including complex examples that aren’t publicly available. What we consistently find is that when these models encounter cases they were never trained on, particularly Indian cases, their accuracy drops sharply,” he said.
He added that AI tools designed to transcribe doctor–patient conversations also struggle in India due to its linguistic diversity. Datta said the only sustainable solution is for India to build its own medical datasets and evaluation standards to ensure responsible and equitable AI adoption.
Quality Over Quantity And The Technical Challenge Of Building Medical AI
Funding, she stresses, remains a significant barrier. “Funding can be a major challenge for anyone, but particularly for groups working in humanitarian and low-resource settings.” To operate sustainably, her team takes a conservative, quality-focused approach: “experiments started on the smaller Llama 2 7B to narrow down optimal pre-training data mixtures and parameters for the scale-up to 70B.”
The curation process was intensive. “This focus on quality over quantity also meant the team spent most of its time carefully curating medically validated textual documents. Continued pretraining minimised the risk of contamination and bias from the general text corpus on which Llama was trained.”
And because training massive models is technically demanding, “the team integrated the Llama architecture into a high-performing distributed trainer, Megatron-LM they made sure to open-source the adapted version of Megatron.”
Global Validation, Doctor-led Feedback, And The MOOVE Initiative
After Meditron’s launch, medical professionals, especially those working in low-resource regions, are asking Meditron challenging questions and critically evaluating its answers so the team can adapt accordingly. These real-world conversations have now become part of Meditron MOOVE (Massive Online Open Validation and Evaluation), a global system where doctors help validate and improve the AI. “The fact that busy doctors are giving their time shows how valuable they think this is,” Hartley said. “We now have a chance to use all this feedback to build a better model, and we hope funders see why this open-source project deserves support.”
Datta, who is one of the collaborators in the Indian Version of MOOVE and a former doctor from AIIMS, said that most medical AI is trained using data from big private hospitals in cities, which means rural and underserved communities are often left out. “If we deploy these systems without fixing this gap, they will work well for some patients but fail the very people who need them most,” he said. Their study, “Radiology’s Last Exam,” highlighted the risk clearly: even the best AI systems performed much worse than trained radiologists when detecting diseases on scans, even though more and more patients are uploading X-rays or CT scans to AI apps and trusting those results over their doctor’s advice.
What Do Indian Experts Have To Say?
Even as specialised models like Meditron evolve, Indian medical leaders warn that AI should never be used as a substitute for clinical judgment. Dr Rohan Krishnan, President of FAIMA, stresses that AI is powerful, but only within limits: “AI models like ChatGPT or Meditron can sometimes outperform doctors in tightly controlled diagnostic tasks, but medicine is not just computation; it involves context, comorbidities, socioeconomic factors, patient preferences, and ethical judgment. Assuming consistent superiority of AI across all real-world cases is neither fair nor safe. AI must remain an assistive tool, not an autonomous clinician.”
His concern extends to India’s overloaded healthcare system: “Over-reliance is a genuine concern, particularly among young doctors working under fatigue. India urgently needs a national regulatory framework that defines legal accountability, mandates human verification, ensures audit trails, and restricts AI from issuing final clinical decisions independently.”
Cyber law expert Karnika A. Seth warns that people often mistake information for advice, “Just like AI apps can give legal information but not advice, health-related information is available using these apps, but not medical advice. Relying on such information to take medication can be risky and should be avoided at all costs.”
At the same time, clinicians acknowledge AI’s benefits. Dr Tarun Kumar, Associate Director, Medanta Moolchand Heart Centre, says, “AI plays a vital role in healthcare by enhancing diagnostics, personalising treatment plans, and streamlining administrative and clinical workflows… It improves accuracy, speeds up decision-making, and can lower costs.”
Dr Meet Ghonia, National General Secretary, FORDA, echoes similar caution, “AI may excel in controlled tests, but real patients are far more complex. AI is excellent for pattern recognition, triage support, and speeding information flow. But it still lacks the contextual understanding, intuition, and bedside assessment that shape a physician’s decisions.”
He warns that “AI advice must always be cross-checked. We need clear disclaimers, clinician oversight, and strict protocols.”
Young Doctors Are Already Using AI Daily: Often Without Clear Guidelines
Reality, however, is moving faster than regulation. Data presented at AIMed25 shows that 43% of medical residents use generative AI tools daily, and two-thirds use them weekly. Researcher Jesse P. Caron found that residents overwhelmingly turn to GPT tools for studying (78%), research (49%), documentation (34%), case prep (34%), and even patient communication (32%).
Yet, according to Caron, “ChatGPT is not meant for clinical use, but it is the most prevalent GPT model used by residents. There are blurred lines regarding acceptable or unacceptable AI use.”
Residents trust AI, but also fear it. According to Caron’s data:
- 70% worry about inaccurate information
- 51% about overreliance
- 31% about violating privacy rules
- 31% about academic dishonesty
As medical AI grows more sophisticated, from general-purpose chatbots to highly trained clinical models, experts agree on one point: the global health system must evolve alongside it.
Hartley argues that open-source public-good models are essential to fair access. Indian experts insist AI must remain assistive, never autonomous. And young doctors are already reshaping clinical practice through daily AI use, with or without institutional guidelines.
The question is no longer whether AI will reshape medicine. It already has. The challenge now is ensuring that the transformation improves safety, equity, and trust, rather than undermining them. In an era where millions seek medical answers from a machine, the world must urgently decide what responsible, ethical, and effective clinical AI should look like, and who gets to shape it.