Medical Sphere

Medical Sphere

876 Photos and videos

Tweets

Pinned Tweet

Medical Sphere

@MedicalSphereAI

May 29

Early results for Claude Opus 4.8 and Gemini 3.5 Flash on @OpenAI's HealthBench Professional: Opus 4.8 looks essentially flat against 4.7 (within noise). Gemini 3.5 Flash is a step up from 3.1 Pro.

3,548

Medical Sphere

Medical Sphere

@MedicalSphereAI

Jun 11

We tested a council of AI models on this week’s @NEJM Image Challenge. As expected, the council predicted the diagnosis correctly! 🤖🏥 🚨 The newly released Claude Fable 5 refused to answer the question.

0:22

340

Medical Sphere

Medical Sphere retweeted

Medical Sphere

@MedicalSphereAI

Jun 9

Claude Opus 4.8 by @AnthropicAI just landed on MedAgentBench 🏥🩺 80.3% accuracy on 300 clinical agent cases, compared to Opus 4.7 at 89.0%. Breakdown by task 👇 medicalsphere.ai/benchmarks/…

627

Medical Sphere

Medical Sphere

@MedicalSphereAI

Jun 9

Claude Fable 5 by @AnthropicAI is live on Medical Sphere 🏥🩺 Put it to test on real clinical cases, head-to-head against other frontier models. Free for verified medical professionals. Try it here: medicalsphere.ai

0:17

6,540

Medical Sphere

Medical Sphere

@MedicalSphereAI

Jun 9

Claude Opus 4.8 by @AnthropicAI just landed on MedAgentBench 🏥🩺 80.3% accuracy on 300 clinical agent cases, compared to Opus 4.7 at 89.0%. Breakdown by task 👇 medicalsphere.ai/benchmarks/…

627

Medical Sphere

Medical Sphere

@MedicalSphereAI

Jun 9

Where does Opus 4.8's MedAgentBench regression come from? Three tasks tell the story: 1⃣ Magnesium medication protocol: 4.8 refuses to return the value when levels are normal, explaining instead. 4.7 just answers. 2⃣ Potassium medication protocol: 4.8 is noisier and more inconsistent. 3⃣ HbA1c lab retrieval: 4.8 retrieves the wrong record. 4.7 is functionally perfect (format-only failures).

136

Medical Sphere

Medical Sphere

@MedicalSphereAI

Jun 9

🔗 Full leaderboard: medicalsphere.ai/benchmarks/…

Medical Sphere | Healthcare AI Evaluation Platform

Compare and evaluate AI models across medical and clinical tasks with validation by healthcare professionals.

medicalsphere.ai

Medical Sphere

Medical Sphere retweeted

Medical Sphere

@MedicalSphereAI

Jun 4

The @NEJM Image Challenge of the week was too easy for AI models! 🏥🤖 Got any more challenging clinical cases for them? Bring it on ⚔️

652

Medical Sphere

Medical Sphere

@MedicalSphereAI

May 29

Early results for Claude Opus 4.8 and Gemini 3.5 Flash on @OpenAI's HealthBench Professional: Opus 4.8 looks essentially flat against 4.7 (within noise). Gemini 3.5 Flash is a step up from 3.1 Pro.

3,548

Medical Sphere

Medical Sphere

@MedicalSphereAI

May 29

🔗 Full leaderboard: medicalsphere.ai/benchmarks/…

184

Medical Sphere

Medical Sphere

@MedicalSphereAI

May 28

Claude Opus 4.8 by @AnthropicAI is now live on Medical Sphere! 🏥🩺 Come test it on medical and clinical cases and see how it performs against other leading AI models. Try it here: medicalsphere.ai/arena

0:19

1,752

Medical Sphere

Medical Sphere

@MedicalSphereAI

May 22

We added AI analysis cost breakdowns to Medical Sphere 🏥📊 Users can now see not only how frontier AI models perform on medical cases, but also token usage, inference costs, and the cost of AI council summaries. In healthcare, accuracy and reliability come first. As AI systems move toward real-world deployment, transparency into the infrastructure behind them matters too.

0:04

288

Medical Sphere

Medical Sphere

@MedicalSphereAI

May 20

Testing the AI council on the @NEJM Medical Image Challenge of the week. This is what the models said 👇 All models agreed the finding is bronchial anthracosis/anthracofibrosis: black bronchial mucosal pigmentation from carbon-laden macrophages, most likely due to biomass smoke and environmental dust.

0:12

680

Medical Sphere

Medical Sphere

@MedicalSphereAI

May 20

🔗 Link to case: medicalsphere.ai/cases/f2527…

154

Medical Sphere

Medical Sphere retweeted

Medical Sphere

@MedicalSphereAI

May 20

Gemini 3.5 Flash now live on Medical Sphere! 🚀

0:07

354

Medical Sphere

Medical Sphere retweeted

Medical Sphere

@MedicalSphereAI

May 13

Meet @AskMedSphere, our medical AI evaluation agent on X 🏥🤖🩺 Tag it on any medical post to compare responses from frontier AI models 👇

0:21

1,076

Pedram Hosseini

Medical Sphere retweeted

Pedram Hosseini

@PedramHosseini

May 13

Like tagging grok but for medical AI evals with a council of AI models instead of just one! 👀

Medical Sphere

@MedicalSphereAI

May 13

Meet @AskMedSphere, our medical AI evaluation agent on X 🏥🤖🩺 Tag it on any medical post to compare responses from frontier AI models 👇

0:21

537

Medical Sphere

Medical Sphere

@MedicalSphereAI

May 12

HealthBench Professional is now live on Medical Sphere 🏥 Built by @OpenAI, this benchmark contains 525 real clinician chat cases across care consults, medical documentation, and medical research. medicalsphere.ai/benchmarks

1,151

Medical Sphere

Medical Sphere

@MedicalSphereAI

May 12

We also have a standalone open-source implementation for full reproducibility 🔁 Built on top of OpenAI's simple-evals, with added support for Claude, Gemini, and more models github.com/medicalsphere/Hea…

GitHub - medicalsphere/HealthBench: Implementation of OpenAI's HealthBench evaluation framework

Implementation of OpenAI's HealthBench evaluation framework - medicalsphere/HealthBench

github.com

148

Medical Sphere

Medical Sphere

@MedicalSphereAI

May 12

Full leaderboard → medicalsphere.ai/benchmarks/…

112