It cannot be called a superintelligence / AGI if it forgets about a medical case if you chat with it about multiple cases for half an hour in the same context window and jumbles things up.
Or if you have to reset it's memory every short while and explain the patient's symptoms and labs again and again to follow-up a case.
For comparison, one oncology resident in Tata hospital Mumbai sees 150 patients in one day in the outpatient clinic and I have seen those guys maintain the same level of composure, kindness and diagnostic acuity at 5pm as when they started at 8am. And they deal with admissions, treatment strategies and plans for those patients throughout their stay in the hospital.
No LLM today can do that! Till these issues are solved they will remain apps and tools for doctors to use. I came across DSPy and GEPA because of your posts
@DrDatta_AIIMS and you may be on to something here about using them in a way that doctors may give natural language feedback to AI models about mistakes the models made. This would in turn lead to the model updating it's prompt or a pipeline where the model's weights could be fine tuned based on doctors' feedback. Till continuous learning and memory are solved for AI, it cannot be called a superintelligence/ AGI.
Most people misunderstand what superintelligence really is. Recent papers (including a big announcement from a frontier tech company) have even started claiming “medical superintelligence.”
Umm, sorry to break the hype… training a model on the entire internet and then comparing it against doctors denied the same resources is not superintelligence, it’s flawed science!
So what would true medical superintelligence look like? To me it’s given the same patient data, the same tools, and the same constraints, the system reliably outperforms trained clinicians across diverse cases, with reproducibility and calibrated uncertainty.
This is hard, because medical reasoning isn’t just pattern matching. If you go deeper into medical reasoning, clinicians flexibly combine hypothetico-deductive reasoning, illness scripts, and dual-process cognition (fast intuitive slow analytical).
Current autoregressive models get stuck in one mode at a time, and that brittleness shows in controlled evaluations where even trainees still outperform them (Which we showed in our Radiology’s Last Exam benchmark recently).
Where AI does look “superhuman” is in discovering hidden signals predicting age, sex, or disease risk from X-rays and fundus images that humans can’t consciously perceive. But correlation ≠ competence; and generalization, bias and safe integration remain unresolved. (Something our lab is actively working on)
If you really ask me, a real test would be something like a Same Data, Same Tools (SDST) trial: let’s say 10k studies with full metadata, identical resources for both doctors and models, measuring accuracy, calibration, and patient impact. Only if the model still wins should we call it “superintelligent.” Difficult to do in real life though.
Until we achieve true medical superintelligence, the useful path is augmentation… error detection, triage, documentation, workflow automation etc. That’s where AI is already valuable today.
Superintelligence is not here yet. We are actively looking at ways to achieve this. Once we do, I will let you know.