Health at Microsoft AI. Previously @GoogleHealth, @GoogleDeepMind, @KingsImaging. Paediatric doctor at @EvelinaLondon. Created NeoMate.co.uk

Joined March 2009
161 Photos and videos
Chris Kelly retweeted
microsoft MAI tech report is a gold mine, one of the most transparent for a model at this scale. this model uses zero synthetic data or distillation from previous models. this means reasoning, agentic behavior, tool use are all learned fully during post-training with no cold start. bold choice that makes it harder and requires more iterations to reach sota, but you get FULL control over your model series and it proves they are serious about being a frontier lab. the tech report is insanely detailed and precise about numbers. to give an example, they give the exact MFU across all the iterations of the model, with the exact changes etc. they also share the full scaling ladder recipe, to my knowledge this is the first time i've seen this in a tech report at this scale let's look at all of this in this likely very long thread 🧵
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end. Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing. Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost. All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat. Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost. Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare. Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: microsoft.ai/news/building-a…
42
267
2,088
283,389
We've all long imagined a health companion that truly knows you - not a chatbot that gives generic advice, but something that accumulates understanding of your health over time, and gets smarter the more you use it. Today we're launching Copilot Health: a secure, separate space within Copilot where you can bring together your health records from 50,000 US provider orgs, data from 50 wearable devices, lab results, and health history all into one place. Copilot Health then applies medical intelligence to make sense of it all, giving you real agency over your own health. Your data, understood in context, patterns surfaced that you might not spot alone, alongside the confidence to walk into your doctor's appointment more informed, prepared, and empowered. We are at a unique inflection point in history - AI is reaching extraordinary capabilities in medicine, while health data is finally being mobilised...and almost everyone has a phone in their pocket (or knows someone with one). We're starting in the US, but ultimately want to bring medical expertise to the billions of people who have never had access to it. Read more here (microsoft.ai/news/introducin…) and you can sign up to the waitlist to help shape what comes next!
1
4
188
Really excited 🥳 to share two breast cancer AI papers from my time at Google, published jointly in Nature Cancer today! We set out in 2021 to answer a question that matters to millions of women: can AI safely improve breast cancer screening in the NHS? Five years, five organisations, and 125,000 women’s scans later, here's what we found. 1️⃣ Our first paper (nature.com/articles/s43018-0…) evaluated Google's mammography AI across five NHS screening services, with 39-month follow-up including interval next-round cancers: → AI achieved superior sensitivity to human readers (54% vs 44%, P<0.001) with non-inferior specificity → 25% of future interval next-round cancers detected = potential for earlier diagnosis → Reading time reduced by 32% while cancer detection increased by 18% → No systematic disparities across age, ethnicity, deprivation, or breast density → Prospective deployment at 12 sites confirmed feasibility but revealed distribution shift requiring recalibration - a critical lesson for implementation 2️⃣ Our second paper (nature.com/articles/s43018-0…) tackled what happens when AI becomes the second reader. When readers disagree today, a specialist panel "arbitrates". We studied 50,000 women's screens with 22 readers, with and without AI as the second reader: → End-to-end including arbitration, our AI-enabled arm was non-inferior to standard double reading (P<0.001) → Human reading workload reduced by 46% → AI flagged far more interval next-round cancers before arbitration, but many were overruled, even when the AI correctly localised the cancer → Future: better explainability, prior image integration, reader training, and new pathways to maximise AI success (e.g. supplemental imaging for high risk normal cases) An editorial from Allan Hackshaw and Rosalind Given-Wilson (nature.com/articles/s43018-0…) covers this work really well - thank you! Conclusion: The AI works, and it can find cancers earlier. But how we integrate it into clinical workflows will determine whether that potential translates into better outcomes for women. This collaboration between @GoogleResearch, @imperialcollege, @RoyalSurrey, @stgeorgeshospital, St George's University Hospitals NHS Foundation Trust, and Imperial College Healthcare NHS Trust was funded by the NHS AI Award. We are deeply grateful to everyone involved. Thank you to @skourti_elena at Nature Cancer. Congratulations Lucy Warren, Marc Wilson, Jenny Venton, Ken Young, Mark Halling-Brown, Megumi Morigami, Lisanne Khoo, Deborah cunningham, Richard Sidebottom, Reddy Mamatha, Hema Purushothaman, Delara Khodabakhshi, Lesley Honeyfield, Amandeep Hujan, Tsvetina Stoycheva, Andy Joiner, Reena Chopra, Aminata Sy, Dominic Ward, Lin Yang, Rory Sayres, Daniel Golden, Namrata Malhotra, Rachita Mallya, Lihong Xi, Della Ogunleye, Charlotte Purdy, Alistair Mackenzie, Jane Chang, Jonathan Dixon, Elzbieta Gruzewska, Emma Lewis, Marcin Sieniek, Shawn Xu, @DrSusanThomas, @shravyas, @fjg28_fiona, @Ara_Darzi, Hutan Ashrafian 🎉
4
7
31
2,337
Chris Kelly retweeted
🪩The one and only @stateofai 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:
61
319
1,027
507,551
30 Jun 2025
New paper today! 🥳 How good is generative AI at diagnosis compared to human doctors? We introduce a novel, interactive medical benchmark (SDBench) for “sequential diagnosis”, and an orchestrator (MAI-DxO) that achieved over 4x higher diagnostic accuracy vs experienced physicians who played the benchmark. 🧵
5
9
37
7,649
Chris Kelly retweeted
Our recent journal article compares the effectiveness of objective vs. subjective labels for AI-based detection of fetal hypoxia from CTGs. Key takeaway? Objective cord pH labels demonstrate greater robustness to temporal shifts. 🔗 Read more: nature.com/articles/s44294-0… #MedicalAI
1
1
5
320
Chris Kelly retweeted
4 Feb 2025
The largest medical #AI randomized controlled trial yet performed, enrolling >100,000 women undergoing mammography screening, was published today @LancetDigitalH The use of A.I. led to 29% higher detection of cancer, no increase of false positives, and reduced workload compared with radiologists without A.I.. thelancet.com/journals/landi…
99
907
3,629
982,270
Chris Kelly retweeted
13 Oct 2024
The tower has caught the rocket!!

44,608
133,118
1,122,126
144,788,681
Chris Kelly retweeted
Check out our blog post on developing deep learning models to predict fetal well-being during labor, using an open source dataset of time series signals consisting of fetal heart rate and uterine contractions and patient clinical data.
27 Sep 2024
Cardiotocography (CTG) is a technique used to monitor fetal well-being. Today we describe how ML models can assess CTGs to predict measures of fetal well-being to potentially assist healthcare providers, reducing burden & improving fetal outcomes. goo.gle/3BpfJnJ

ALT We build a model development and evaluation pipeline to enable fetal well-being prediction that takes into account limited data, clinical metadata, and intermittent methods used in low-resource settings.

2
10
41
4,262
Chris Kelly retweeted
10 Jun 2024
We are excited to announce that we've raised $20 million in Series A funding led by New Enterprise Associates (NEA), with participation from Sequoia Capital, Blue Lion Global and Neo.
2
21
60
13,202
Chris Kelly retweeted
3 Jun 2024
A new era for Co:Helm as we introduce our new name and brand identity — Anterior. We are the AI company built by clinicians for clinicians to transform healthcare administration.
11
27
4,741
13 May 2024
Gross! Was trying to explain bacteria to my 6 year old son, so figured we should get some Agar plates… The results: eurgh. He is now asking me if I’ve washed my hands each time I pick up my phone 😂
1
18
2,017
13 May 2024
Turns out that putting the dishes on top of our @Sonos amp with some music quietly playing gives you a fairly steady 32’c for incubation, and monitored by a @MEATERmade cooking thermometer 🤓
1
364