Joined October 2014
862 Photos and videos
Pinned Tweet
Meta-pin👇
2
7
51
As usual, @hadleywickham has one of the cleanest, simplest, nicest explanations of things: in this case, the relationship between LLMs, harnesses, and agents. Better than a million Medium posts. tidydesign.substack.com/p/wh…

2
27
200
11,974
I’m admittedly quite close to the subject on this one, but I’ve been having a lot of Gell-man amnesia whiplash the last few days seeing a lot of really smart folks make breathless takes about AI trends largely based on: (1) head to heads of open web search LLMs vs closed specialty systems on questions (and rubrics!) that are easily googled (2) a lot of really entertainingly confident assumptions about what’s under the hood at OE. I appreciate the impulse, there’s a lot yet to be discovered for this field, and OE can and should and hopes to do more to support publishable evals of these tools. But this one imo isn’t it. If you have ideas for what the ideal clinically relevant eval looks like, hit me up. Can’t promise it’ll happen but can promise it’ll be considered. (COI: work with OE, potentially motivated reasoning but not intentionally so)
Rigorous evaluation of medical AI is good for everyone, and we welcome it. Counter to a half-dozen independent studies from institutions such as the Mayo Clinic that were highly positive on OpenEvidence—a lone paper now purports to show that generalized AI beats specialized clinical AI (@UpToDate, @EvidenceOpen). The paper has a massive undisclosed conflict of interest and irredeemable methodological flaws. Behind the scenes: The study authors run a competing in-house medical AI at their hospital, and asked OpenEvidence for an API to power it — including rights to build a "competing product" with OpenEvidence's own API. OpenEvidence declined. Then, this paper coincidentally appeared. Point-by-point, looking closely at the datasets used in the study, the disingenuous and fatal flaws become immediately apparent 🧵.
10
3
51
13,261
Surely DFW smiles down when this is shared... the conduit for religious experience himself (nytimes.com/2006/08/20/sport…) articulately exploring the themes by which -- through her own lack of introspection -- Tracey Austin broke his heart. gwern.net/doc/psychology/wil… cc @AndrewLBeam
I come back to this speech every once in a while: “in the 1,526 singles matches I played in my career, I won almost 80% of those matches … what percentage of points do you think I won in those matches? only 54%.”
4
847
I've always approached the 80k hours work with an optimistic prior. But it's exactly this sort of analysis that reflects -- for me-- significant issues with their approach, which tend toward "super coarse numbers go in, extremely consequential personal decisions come out".
Replying to @tomaspueyo
3. People think doctor is the best career to have an impact. When quantified, doctors save on average one life every 10 years. So it's good, but there are better ways Conversely, making lots of 💰 and donating a share can get you to save 80 lives (20x more than doctor)
2
2
14
8,090
The reason "grade inflation" and all these other concepts don't really make sense to me is because I have a pretty simple, and I think appropriate, definition of what a grade is (consistent with Alex's).
As I've said before: a "grade" should mean a band within a dependency graph of skills. These are the skills the school considers "mastered". Imagine: "hey Mom, want to see me add fractions (or solve the quadratic formula)? Click on this node in the graph and it'll spin up 5 problems, and watch how effortlessly I get the right answer." Now *that* would be accountable schooling!
5
8
63
10,505
I've soured a bit on podcasts over the last few years but will probably buck that trend for this one as I've never heard a Michael I. Jordan take I didn't like.
Michael I. Jordan on the new MLST. Four things: > AGI is a PR term. It confuses young people. > Discourse is bipolar, either alarmist or exuberant, this is in his words "so demoralizing" for 20- and 25-year-old researchers. > ML's methods came from statistics and operations research, NOT the AI tradition. > Data markets are Stackelberg games, not optimisation problems. A lot of ML researchers have never computed an equilibrium. Michael I. Jordan is a no-nonsense original gangster of the field and was described by Science magazine, back in 2016 as the most influential living computer scientist.
5
2,575
Sam Finlayson retweeted
hallucinated references will land you a 1-year ban from arxiv now. wow
91
367
3,511
240,334
Sam Finlayson retweeted
To elaborate: I disagreed with Hinton’s infamous 2016 prediction at the time, but I believe the bull case for radiology’s obsolescence based on CV trends in 2016 was stronger than the bull case for surgery’s obsolescence based on robotics in 2026 and I don’t think it’s close.
1
1
532
Not a gambling man, but I would strongly consider a formal bet on this one. (My position would be against the claim “surgery will be replaced by AI within 10 years”)
If you think surgery won’t be replaced by AI/robotics in the next decade, I think you’re actually nuts. 🤯 instagram.com/reel/DXuxGGRFU…
5
12
2,932
To elaborate: I disagreed with Hinton’s infamous 2016 prediction at the time, but I believe the bull case for radiology’s obsolescence based on CV trends in 2016 was stronger than the bull case for surgery’s obsolescence based on robotics in 2026 and I don’t think it’s close.
1
1
532
Sam Finlayson retweeted
when the patient getting awakened q1hr for days finally develops delirium:

ALT Mitchell And Webb Are We The Baddies GIF

Hourly Neurological Examinations after Acute Brain Injury rarely detect actionable events after 48h and are associated with delirium. @NeurosurgeryCNS journals.lww.com/neurosurger…
19
186
15,386
AI news cycle in a nutshell
5
10
35
5,416
Sam Finlayson retweeted
Nuanced TLDR explainer on the import of the recent @ScienceMagazine article on LLM performance vs doctors.
🧵1/ Our new study on AI and physician reasoning just came out in @ScienceMagazine. As co-senior author, I'm excited about our findings, and I do think AI will reshape medicine. But after seeing some of the discussions, I'm also worried about how our findings may be misinterpreted.
4
25
3,845
Sam Finlayson retweeted
This is the the quote I've been citing a lot recently.
you can outsource your thinking but you cannot outsource your understanding
848
4,387
46,834
2,595,103
You've got to remember that these are just simple jurors...The common clay of the new AI spring.
The state of AI, as captured by the jurors in the Musk v. OpenAI trial:
336
Sam Finlayson retweeted
Stoked for the day when the scientific standard for AI in Biology becomes “does it actually do the thing?” and not “does it point to a potential future in which it does the thing?”
8
12
112
6,427
Sam Finlayson retweeted
Impressive data set. Very pediatrics relevant
A new article introduces EchoNext-Mini, an open dataset of 100,000 electrocardiograms with curated structural heart disease labels and an accompanying convolutional neural network model for detecting structural heart disease from electrocardiogram data. nejm.ai/4sI5Iab
2
20
6,246
Interested in seeing what uses people come up with for this
Dot phrases are the EHR’s best-kept secret. Clinical shortcuts, templates, and workarounds, all encoded in fragments of text that only make sense to the person who wrote them. We built the AI-native version. Dotflows are reusable natural language prompts that customize how OpenEvidence responds. Type “.” in the search bar and the platform adapts to your style, your specialty, your thinking. Use .avs to generate a patient-facing after visit summary. .discharge for structured inpatient notes. .prior_auth to write an insurance appeal letter, because of course that’s one of the first things physicians automated. And .succinct, which compresses every answer into high-yield shorthand. Apparently we were being too thorough. Browse the community library to see what other clinicians have created and steal any you like. Or build your own.
2
415
Sam Finlayson retweeted
Apr 17
“Taste is the only moat” - VCs right before investing in gambling apps, AI to raise kids, and startups with 12,000 fake GitHub stars
44
211
3,335
152,763