Building World Models at Microsoft AI. Ex-Meta Researcher for 3D Generative AI, former PhD Student @ TU Munich w/ Matthias Nießner

Joined December 2019
3 Photos and videos
Pinned Tweet
MAI-Thinking-1 is out! Built from the ground up without any third-party distillation - it’s been a fun climb 🧗 Check out our 100 page tech report: microsoft.ai/wp-content/uplo…

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end. Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing. Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost. All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat. Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost. Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare. Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: microsoft.ai/news/building-a…
3
4
25
2,798
Norman Müller retweeted
Beautiful tech report, perhaps the best western model report I’ve ever read. Lots of great insights: no synthetic data in midtrain, teacher models are RLed directly on top of midtrain, and adaptive clip higher. But still seems like they didn’t fully nail true on-policy as they admit their RL stage is unstable, leading to a hacky self-distillation stage (imo)
WOW microsoft new "MAI Thinking 1" model comes with a 109 page tech report that looks REALLY detailed, this is amazing
5
8
113
12,685
Norman Müller retweeted
Mai-1 thinking: Mid size model, 45b active parameter, MoE, side by side with sonnet 4.6 0 distillation „Microsoft’s first reasoning model“
Mustafa Suleyman, Microsoft AI: 7 new Microsoft Models, no end in sight when it comes to development, orders of magnitude in the next few years
39
37
871
113,359
I've officially defended my PhD! 🎓 Never thought that researching 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 𝐨𝐧 𝟑𝐃 𝐑𝐞𝐩𝐫𝐞𝐬𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧𝐬 could be as challenging as dealing with the German bureaucracy afterwards!
Congrats to @Normanisation for his successful PhD defense 🥳🎓 Norman's thesis about 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 𝐨𝐧 𝟑𝐃 𝐑𝐞𝐩𝐫𝐞𝐬𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧𝐬 makes important contributions to the 3D vision community. For instance, DiffRF, a generative approach directly operating in 3D space, was among the first diffusion techniques for neural radiance fields. This led to many follow up works in this area and sparked interest across the computer vision community, establishing generative approaches as a corner stone in the 3D domain. Also after his PhD, Norman continues to work on the forefront in computer vision, such as his contributions to MapAnything, a universal feedforward approach for 3D reconstruction. Check out Norman's amazing work: normanm.de/ Congratulations Dr. Mueller - super proud!
14
96
6,241
I am incredibly grateful for all the support and guidance from my supervisor @MattNiessner and my examiner @LourdesAgapito. Thank you for everything! 🎉🥂
5
281
Incredible work by the image team, placing #3 family on @arena!
Microsoft’s MAI-Image-2 enters the top three AI image generators in the world thenextweb.com/news/microsof… via @thenextweb
2
17
1,588
Norman Müller retweeted
Mar 19
MAI-Image-2 debuts at #5 in the Image Arena! Highlights: - #5 in Text-to-Image overall - #5 for 3D Imaging & Modeling, Cartoon, Anime & Fantasy, Photorealistic & Cinematic Imagery, Art and Portraits - #6 for Product, Branding & Commercial Design Congrats to the @MicrosoftAI team on this milestone!
5
17
143
30,131
Come work with us on awesome stuff!
I am working on building teams for the likes of @NandoDF and @asadovsky for @MicrosoftAI! We are #HiringNow in Europe and US 🧵👇 - follow me for more updates on our Superintelligence Lab - Roles below 🔔
6
730
Norman Müller retweeted
I’d like to hire strong data engineers to join our Microsoft Super Intelligence (MSI) team. I am interested in people who are good at processing PDFs and other documents at billion scale, and people good at parsing the web at trillion scale. If you dream of processing all of human knowledge to advance science and engineering, this is for you. Also looking for strong evaluation and post-training engineers. Be part of our first launches this year 🚀 We have all the resources in the world to support you, working in startup mode, while powering a large organisation with billions of users. Hiring in London, Zurich, New York, Boston, Toronto, Seattle and SF. Please send your CV to JoinAITeam@microsoft.com
19
48
523
103,485
Check out what Guy and the team have been cooking over at @PipioHQ - super impressive!
Video gen models excel at replacing humans in front of the camera. But what if they could augment & enhance our performance instead? 🧵 (1/6) 🎬EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers edit-yourself.github.io
1
2
657
As always, amazing work by Yawar et al.!
Introducing ShapeR, a method for robust conditional 3D shape generation from casually captured sequences. ShapeR leverages a rectified flow transformer conditioned on per-object multimodal data to turn casual image sequences into full metric scene reconstructions. Project Page: facebookresearch.github.io/S… Paper: arxiv.org/abs/2601.11514 Links to code and huggingface below ⬇️
4
81
9,135
Norman Müller retweeted
Meta just released the MapAnything benchmark on Hugging Face Universal 3D reconstruction evaluation across multi-view stereo, depth & camera pose tasks. Benchmark feed-forward models on diverse real-world scenes with standardized metrics.
3
32
299
22,172
Nikhil has been cooking up an even stronger MapAnything, with noticeably better performance than VGGT and others, while supporting camera information input and more! 🎉 Check out the checkpoints!
19 Dec 2025
Was gonna wait to announce this 😅 🚨 but yeah new checkpoints just dropped!! Pull latest code and update hf cache - it's a direct slot in & just a better MapA 😉 HF demo also updated: huggingface.co/spaces/facebo… Stay tuned for more details & 3DV camera ready ⏳
8
1,245
Interested in 3D Interactive Segmentation? 🚀 Don't miss Andrea's talk on Easy3D today at 1 PM (Kalākaua Ballroom)! The code was just released: 🔗: github.com/facebookresearch/…
See you later at the #iccv25 Oral Session 6B (Kalākaua Ballroom) at 1PM and poster 356 from 2:30PM! We will present our paper “Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation” with @Normanisation Project Code: simonelli-andrea.github.io/e…
20
2,353
Check out or workshop on Generate Scene Completion at ICCV'25. We have an incredible speaker lineup and most certainly the coolest website (credits to @ethanjohnweber and @cursor_ai). 📅Mon, Oct 20 (morning session) 🌐scenecomp.github.io
📢 SceneComp @ ICCV 2025 🏝️ 🌎 Generative Scene Completion for Immersive Worlds 🛠️ Reconstruct what you know AND 🪄 Generate what you don’t! 🙌 Meet our speakers @angelaqdai, @holynski_, @jampani_varun, @ZGojcic @taiyasaki, Peter Kontschieder scenecomp.github.io #ICCV2025
1
2
17
1,913
Excited to share our Swiss Army Knife for Feed-forward Geometric Modeling: MapAnything is fast, accurate, robust, and highly versatile! Try it yourself: huggingface.co/spaces/facebo… Learn more: map-anything.github.io/
17 Sep 2025
Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀 One universal model enables SoTA for: 🔥 Mono Depth Estimation 🔥 Multi-View SfM 🔥 Multi-View Stereo 🔥 Depth Completion 🔥 Registration … and many more possibilities! – plus everything is metric 🎯 We release code for data processing, training, benchmarking & ablations – everything Apache 2.0! Details & Links 👇
3
7
28
9,406
Make sure to step by Sherwin’s poster to learn more about camera control for video models!
📢Excited to be at #ICLR2025 for our paper: VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Poster: Thu 3-5:30 PM (#134) Website: snap-research.github.io/vd3d… Code: github.com/snap-research/ac3… Also check out our #CVPR2025 follow-up AC3D: snap-research.github.io/ac3d…
2
1
14
1,360
SOTA 3D Interactive Segmentation We had a lot of fun playing with the possibilities of our real-time 3D segmentation model. Blowing up furniture, rearranging interiors, and of course using the Dune thumpers to turn objects into sand.
Tired of staring at GS reconstructions? Check out our new method for 3D Interactive Segmentation💥 Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation Project: simonelli-andrea.github.io/e… Paper: arxiv.org/pdf/2504.11024 👇Real-time VR interaction on a GS scene👇
1
21
639
Check out Tobias' great work leveraging generative priors to improve 3D reconstruction quality!
1/3 Introducing FlowR 🌸: Flowing from Sparse to Dense 3D Reconstructions We learn a direct mapping between incorrect renderings and their corresponding ground-truth images, augmenting scene captures with consistent novel, generated views to improve reconstruction quality.
4
64
49,496
Norman Müller retweeted
1/3 Introducing FlowR 🌸: Flowing from Sparse to Dense 3D Reconstructions We learn a direct mapping between incorrect renderings and their corresponding ground-truth images, augmenting scene captures with consistent novel, generated views to improve reconstruction quality.
4
40
226
29,749