Some brief thoughts on this study (longer reflections later) about how the frontier model vs domain-specific model conversation interacts with the broader notion of AI in clinical practice.
1. The ability to personally manipulate how models respond is remains underrated
Any trainee through the medical system will know that the way information is communicated, analyzed, and acted on can vastly differ from physician and speciality.
That aspect of communication -- what details to reinforce, what values to skip, how to communicate differential diagnoses in an objective manner -- is a function of many factors, one of which is trust in the presenter.
As each physician uses these tools to augment their own personal style of thinking, the ability to shape and structure the manner in which these tools convey answers will directly impact how effective and preferable using these tools will be.
To that extent -- frontier LLMs, which offer at least some degree of steering model outputs via custom instructions, projects, etc. -- will naturally have an advantage. Allowing clinicians to manipulate these workflows such that they can better intuit where the tools may fail vs be useful will be huge. [0]
[0]
x.com/samarthrawal/status/20…
2. LLMs are superhuman persuaders
"i expect ai to be capable of superhuman persuasion well before it is superhuman at general intelligence" - Sam Altman
We have definitely crossed this point, at least with using LLMs to answer clinical questions: for a particular question, chances are there will be some paper on some data that can positively affirm the implicit bias within that question.
And since models are remarkably capable at synthesizing complex information into a cohesive narrative, they will be able to give a very reasonable-sounding answer to a clinical question, whether or not it is truly applicable in that scenario (ie, has reasonable external validity).
Relying on personal clinical "preference" to select an LLM tool of choice can therefore be quite dangerous, because a style of answer may not correlate with the applicability of the answer.
This is something I try to be cognizant of whenever I use LLMs, particularly in the clinical setting, in order to be watchful against this error mode.
([1] Related slides - How I use LLMs in Clinical Research & Practice -
x.com/samarthrawal/status/19…)
3. The boundary between answering a question and clinical reasoning is blurry
even before the adoption of LLMs, there was always a bit of grey area between looking up information and clinical decision-making. For example: an article on UpToDate will give you basic facts and data about a particular condition, but also segue into next steps (thereby helping clinicians reason about problems). By itself, this is not a bad thing - in fact, it can be very valuable.
Now though, tools are able to remarkably synthesize facts, studies, empiric results, and clinical guidelines into a single, cohesive answer.
Where does answering a question stop, and making a decision based on explicit and implicit phrasing of the clinical question end? This undoubtedly reshapes how we think about clinical practice, and therefore being aware of this process will be extremely important to clinicians moving forward, in my opinion.
(some more on this topic here - [2]
samrawal.com/blog/augmenting…)