Gemini powers our multimodal health research! 💙
In our new paper on multimodal AMIE, we're pushing conversational diagnostic AI beyond text to handle images such as skin photos, ECGs, and clinical docs, which provide crucial context in healthcare.
Blog:
goo.gle/42D0QcB
Paper:
gstatic.com/amie/multimodal_…
How do we make an AI reason like a clinician during a dynamic, multimodal conversation? One of our key contributions is multimodal state-aware reasoning, built on
@GoogleDeepMind Gemini 2.0 Flash.
Instead of just reacting turn-by-turn, AMIE maintains an internal "understanding" of the consultation:
✅ What is known about the patient?
✅ What are the likely diagnoses?
✅ What information (text or visual) is missing?
This internal state allows AMIE to:
👉 Intelligently guide the conversation through phases like history-taking & diagnosis.
👉 Strategically ask for relevant images (like skin photos or screenshots of ECGs/docs) when its internal state shows uncertainty.
👉 Accurately interpret multimodal data and weave the findings back into the ongoing dialogue and diagnostic process.
Essentially, it mimics the adaptive reasoning clinicians use, leading to a more structured and effective consultation.
We evaluated multimodal AMIE against primary care physicians (PCPs) in a demanding, blinded OSCE study using 105 diverse multimodal scenarios.
The results demonstrate clear progress: AMIE achieved similar or superior performance when compared to PCPs across a wide range of metrics, including diagnostic accuracy, empathy, and critically, the handling and reasoning about multimodal data.
While the OSCE results are very promising, it's important to remember this was a test environment with patient actors! Real-world care is more complex. Making sure it's safe, reliable, and actually helpful in the real world needs more work, starting with our upcoming study with Harvard BIDMC.
The work would not have been possible without an amazing team
@GoogleAI,
@GoogleDeepMind:
@RyutaroTanno,
@alan_karthi,
@vivnat,
@AdamRodmanMD,
@timstro,
@taotu831,
@hardyshakerman,
@JanFreyberg,
@_cjpark,
@yasharmaa,
@apalepu13,
@arkitus,
@weballergy,
@valentinlievin,
@ckbjimmy,
@davidstutz92,
@dgtbarrett,
@yongcheng16 @SaraM66905,
@dr2w,
@ymatias
Building on Articulate Medical Intelligence Explorer — AMIE, our research diagnostic conversational AI agent — today on the blog we share a first of its kind demonstration of a multimodal conversational diagnostic AI agent, multimodal AMIE. Learn more →
goo.gle/42D0QcB