Medical Oncologist | The Vancouver Clinic, Legacy Cancer Institute. Creator of Oncology AI Lab — stress testing AI to separate the signal from noise

Joined June 2020
8 Photos and videos
Pinned Tweet
Community oncologist. I use AI to appraise clinical trials, then verify every finding against the primary data. The AI gets graded too. One trial, two readers, one bottom line. The Source Report on YouTube (in bio) and Substack. allenlimd.substack.com #onced #oncology #meded

6
818
Allen Li MD retweeted
Competition here is healthy. Credit first: independent study, public code, and the main benchmark uses real physician questions rated by blinded clinicians. That part is strong. Three issues. 1️⃣ (Others have flagged this.) The two sides were tested differently. Frontier models: API, temperature zero, search on. Clinical tools: queried by hand in their browsers, with hidden prompts and retrieval the authors could not control. They say so. And search gave the frontier models retrieval, the thing clinical tools are built on. Not specialized vs general. Two systems with retrieval, different doors. 2️⃣ The weight rests on one benchmark, not three. MedQA and HealthBench carry contamination risk. HealthBench was built by OpenAI, its model won it, and answers were graded by the same three models being scored. The authors call the clinician rated benchmark primary, HealthBench supplementary. On MedQA, Claude tied both clinical tools. 3️⃣ Outperform means quality, not safety. No model was safer than another. The highest harmful response rate was a frontier model. The floor the clinical tools missed was a free search feature, not only purpose built tools.
1
142
FDA approved Truqap (capivasertib) abiraterone for PTEN-deficient mHSPC. CAPItello281 hit its primary endpoint: 7.5 months added rPFS (HR 0.81). A real signal. In it, 74% had high-volume disease, about a quarter with visceral mets, exactly where triplet is NCCN-preferred. This trial had no chemo arm to compare the two. The chemo triplet trials before it, PEACE-1 and ARASENS, both showed an OS benefit. CAPItello281 has not shown one yet (HR 0.90). New Source Report: youtube.com/shorts/-H6isX04e… @FDAOncology @myESMO @ASCO @oncoalert #ProstateCancer #GUOnc #Oncology #PrecisionMedicine #capivasertib #CAPItello281
2
237
The case for AI in oncology is real: it holds every molecular finding, every trial, every approved option, all at once. No human can. But that was never the hard part. The hard part is knowing which of them is right for the patient in front of you. Whether a model ever masters that is the real question. Right now, it’s still ours.
1
45
PROTEUS in NEJM: perioperative apalutamide reports a metastasis free survival win in high-risk prostate cancer, HR 0.80. The catch: as many has already pointed out, the endpoint was redefined mid-trial to add PSMA-PET. MFS by PSMA-PET is not yet a validated surrogate for survival. The version that is validated, conventional imaging, was not significant (HR 0.84). And OS so far favors placebo (HR 1.08, very immature). youtube.com/shorts/xmTPQogWK… #MedOnc #ProstateCancer #PROTEUS #EvidenceBasedMedicine #ASCO2026 #MedOnc #Oncology
1
294
In NEJM: daraxonrasib is the first RAS-targeted drug to extend survival in pancreatic cancer. In 2nd-line RAS G12 disease it roughly doubled median OS, 6.6 to 13.2 months (HR 0.40). A real win. Three things the abstract doesn’t foreground ↓ #MedOnc #PancreaticCancer #RASolute302 #EBM youtube.com/shorts/VgM_9sk-0…
111
Google’s healthcare AI, AMIE, “beat” medicine trainees and oncology fellows on breast cancer cases. Look closer: the items where AMIE scored highest had rubrics identical to the AI’s own prompt, word for word. It’s like grading a fellow on whether they followed a notecard you handed them. That measures instruction-following, not clinical judgment. (This is the preprint. Now published in NEJM AI with a larger dataset, which I’m checking next.) youtube.com/shorts/3jjBjvPoN… #MedTwitter #AIinMedicine #ClinicalAI #BreastCancer
2
5
633
Credit to the authors for including this in the supplement. It is this kind of academic integrity that will move the field of AI in medicine forward.
47
With the FDA approval today of Dato-dxd based on Tropion Breast 02 for mTNBC, it is worth revisiting. It’s a good option for mTNBC. One important point is that the OS benefit actually is regional dependent. In the US/Canada/Europe subgroup the HR is actually reversed!👇 youtube.com/shorts/fqJUehEGc… #OncTwitter #bcsm #datodxd #tropionbreast02
1
210
I believe AI will transform medicine for the better, and in many ways, it already has. But we won’t get there by cheerleading. We’ll get there by being honest about where these tools fall short today.
2
111
Liability issue aside, calling AI in oncology supporting roles “lower risk” may be an underestimate. Even a decision as routine as IV hydration between chemo can be consequential. Is the patient dehydrated or fluid overloaded? Do cardiac or renal comorbidities tip the calculus? A web interface, AI or human, often cannot see what is needed to decide well. Trust gets built the way it always has in medicine: prospective evaluation, prespecified endpoints, honest reporting of where the tool fails. One can simultaneously believe in the power and promise of AI, be skeptical and critical of its limits today, and hope to inform its potential for tomorrow. Agree that early engagement from the actual care team will be key.​​​​​​​​​​​​​​​​ ascopost.com/issues/may-10-2…
1
136
Here is the paradox. If physicians refuse to give up any control, AI’s clinical role gets defined by everyone except clinicians: management, payers, vendors. If physicians give up control before trust is built, patients bear the risk. Trust needs data. Data alone may not be enough without personal experience. Personal experience requires giving up some control. So how does a clinician earn the experience needed to build the trust, without first giving up the control that experience requires? There is a small precedent here. When fax machines arrived, people did not trust that messages actually landed. The printed receipt bridged the gap. It let users build experience with the new tool in a way they could audit, until the receipt itself became unnecessary. AI may need its own version of that artifact: outputs that surface uncertainty and let clinicians check the reasoning, so the experience needed to earn trust can accumulate without giving up control blindly.
1
62
Side note: even today, I do not completely trust the fax machine, especially when it is being sent by the all-in-one printer/copier/scanner that takes up the whole corner of the clinic office. Maybe this says more about me than the fax machine.
44
The coverage of the Science paper claims AI beats doctors in clinical reasoning. We need to be more critical of what “clinical reasoning“ means in this publication. Take a look at how the AI “beats”physician in this Science paper. AI is getting better every day and will be an important part of medicine. However, what this paper may have shown is that AI is better at generating a list of things instead of frank clinical reasoning compared to physicians. youtube.com/shorts/Io9aFmZbL… #OncTwitter #AIinMedicine #EvidenceBasedMedicine
1
6
3,101
The Brodeur Science paper has been the loudest AI-in-medicine story of the week. The eLetter version of my thought is now up at Science. Three methodological concerns: 1. Rubric structure rewards listing items. The Grey Matters Q1 rubric is purely additive: 19 points across 22 line items, no penalty for excess or wrong tests. AI lists everything; physicians write focused notes. The 89-vs-34 headline measures rubric enumeration. 2. Information gradient. In the paper's head-to-head ED experiment, AI's edge concentrates at triage with sparse data (67% vs 50–55%). By admission with the full workup, AI 81.6% vs Physician 1 78.9% — no longer statistically significant. Same patients, same model, same physicians; only information level changes. 3. Historical comparators. Five of the six experiments compare AI in 2024–2025 against physicians scored on different cases, by different graders, in earlier publications. The 55-percentage-point gap on Grey Matters cannot be cleanly attributed to model superiority. TL;DR: AI is better at listing things on a checklist. The Brodeur paper measures that very well. Whether it translates to better patient care is a different question. Headline-only readers are most at risk of being replaced by AI. Video: youtu.be/Rl2pJUwuTk0 eLetter at Science: science.org/doi/10.1126/scie… @VincentRK @HemOncFellows @OncBrothers @DrArturoAI @montypal @operationdanish @Papa_Heme @EricTopol @DrRishabhOnco @OncoAlert @OncoReporte @Larvol @OncologyBGLab @JavierDavidBen2 @csoncol @Timothee_MD @JCOOP_ASCO @TwoOncDocs @FCademartiri @doctorbhargav #OncTwitter #AIinMedicine #EvidenceBasedMedicine

5
582