McAfee Professor of Engineering @MIT; Co-Founder & CTO at Unreasonable Labs; AI-Driven Scientific Discovery

Joined December 2014
1,233 Photos and videos
Pinned Tweet
We've made a breakthrough in self-evolving AI scientists moving from "search" to "principled discovery": Scientific discovery requires that the search space itself changes, and an AI scientist must perceive this shift without intervention. We built an AI that achieves this for the first time with the ability to discover the scientific vocabulary it reasons in. Evidence, tools, artifacts, verifiers, failures & claims become typed provenance. We show three distinct modalities: 1) retrieval, adding known objects; 2) search, exploring a fixed schema; and critically: 3) discovery, a verified regime transition. We solve the open-endedness evaluation problem by lifting agentic workflows into a typed copresheaf and proving, via a Kan obstruction, that true discovery is not unbounded generation but a verifiable schema expansion: old evidence is transported by Left Kan extension, and genuine novelty is mathematically quantified by the pointwise residual beyond the transported image - separating discovery from mere search and making novelty objective and measurable rather than a subjective judgment or benchmark delta. Our AI scientist is built in a way that does not pre-conceive the approach it chooses; instead, we endow the system with formal power to adapt, evolve, and reason from first principles. Case studies include: 1⃣Builder/Breaker model that discovers mode-conditioned compliance in proteins; 2⃣CategoryScienceClaw that finds anisotropic fiber-network stiffness rules. Great work in collaboration with my graduate student @fwang108_ @MITdeptofBE F.Y. Wang & M.J. Buehler, Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence, arXiv:2606.01444, 2026
105
378
2,524
781,010
Markus J. Buehler retweeted
Impressive fable demo that with impact design and engineering
Replying to @ProfBuehlerMIT
Here is another example, this time modeling fracture (irreversible deformation) of a foam material based off an image ⤵️
3
6
1,123
Markus J. Buehler retweeted
MIT friends 👇🏼
The release of Anthropic's Mythos-class Claude Fable 5 is the latest signal that we are in a phase of exponential growth in AI capabilities, "takeoff mode". The biggest leaps are in engineering and scientific reasoning: frontier models now match or exceed expert-level performance on many technical tasks, and increasingly act as collaborators that plan, code, simulate, and design (as we show in our own research on self-improving agentic discovery systems). Understand these technologies deeply is critically, both the foundations and how they're applied to critical industrial problems at scale, and to use them to drive innovation and technology development. This July 27–30, I will teach Applied AI for Materials Discovery at @MITProfessional (live online so you can participate from anywhere). It's a hands-on deep dive into the shift from predictive ML to agentic, closed-loop AI-native discovery and innovation. Highlights: ▶ AI scientists & recursive self-improving swarm intelligence: massively parallel agents that read literature, formulate hypotheses, write and run code, and critique each other's work ▶ Generative AI for inverse design: diffusion and flow matching for proteins, alloys, metamaterials, and crystals ▶ Foundation models that "think" physics: graph transformers, neural interatomic potentials, neural operators and PINNs ▶ Bridging the reality gap across scales: connecting atomic-scale agents to physics simulators and product-scale (DFT, MD, FEA) for automated verification of AI-generated designs ▶ Building custom reasoning models: fine-tuning, RL; incorporating first-principles physical agency (e.g. MCP, tool use) ▶ Unlocking dormant knowledge: turning unstructured data (papers, patents, lab notebooks, legacy PDFs, etc.) into structured, actionable insight ▶ Interpretability, reliability, and enterprise deployment The course will provide you with ready-to-use agent templates, dozens of code notebooks, repos, and curated datasets you can deploy immediately in your organization. More details on the course and registration link, see reply.
1
4
16
3,514
The release of Anthropic's Mythos-class Claude Fable 5 is the latest signal that we are in a phase of exponential growth in AI capabilities, "takeoff mode". The biggest leaps are in engineering and scientific reasoning: frontier models now match or exceed expert-level performance on many technical tasks, and increasingly act as collaborators that plan, code, simulate, and design (as we show in our own research on self-improving agentic discovery systems). Understand these technologies deeply is critically, both the foundations and how they're applied to critical industrial problems at scale, and to use them to drive innovation and technology development. This July 27–30, I will teach Applied AI for Materials Discovery at @MITProfessional (live online so you can participate from anywhere). It's a hands-on deep dive into the shift from predictive ML to agentic, closed-loop AI-native discovery and innovation. Highlights: ▶ AI scientists & recursive self-improving swarm intelligence: massively parallel agents that read literature, formulate hypotheses, write and run code, and critique each other's work ▶ Generative AI for inverse design: diffusion and flow matching for proteins, alloys, metamaterials, and crystals ▶ Foundation models that "think" physics: graph transformers, neural interatomic potentials, neural operators and PINNs ▶ Bridging the reality gap across scales: connecting atomic-scale agents to physics simulators and product-scale (DFT, MD, FEA) for automated verification of AI-generated designs ▶ Building custom reasoning models: fine-tuning, RL; incorporating first-principles physical agency (e.g. MCP, tool use) ▶ Unlocking dormant knowledge: turning unstructured data (papers, patents, lab notebooks, legacy PDFs, etc.) into structured, actionable insight ▶ Interpretability, reliability, and enterprise deployment The course will provide you with ready-to-use agent templates, dozens of code notebooks, repos, and curated datasets you can deploy immediately in your organization. More details on the course and registration link, see reply.
6
18
107
12,543
Claude Fable 5 has impressive spatial reasoning capabilities that are immediately relevant for engineering and design: In this example I gave Claude Fable 5 a single photo of a hierarchical mesh torus (image generated using a text-to-image model); and in one shot, no iteration, it built a full interactive 3D simulation: ~1,400 nodes & 4,400 fibers, realistic mass–spring physics you can grab, compress, twist… plus strain-based sonification so you can 'hear' the structure vibrate. It inferred the complete 3D topology from a partial 2D view. ⤵️
14
26
275
30,968
Here is the image if others want to use it:
1
1
9
878
Here is another example, this time modeling fracture (irreversible deformation) of a foam material based off an image ⤵️
2
1
14
2,800
Markus J. Buehler retweeted
Scientific research is fundamental to advancing civilization and helping people globally to solve the most critical problems, from medicine to materials, from brain science to physics, and much beyond. This is only possible when scientists have access to the best tools of the time to conduct scientific research, including having access to AI-based tools.
119
467
3,076
189,482
👏👏👏
In a week when some of the leaders in AI are trying to pull up the ladder behind them and prevent the automation of science and self-improving superintelligence, we're committed to building RSI safely and publicizing the outputs of our system to give humanity an audit trail of its inventions and intentions and let the open source community build on top of them. Stay tuned for the first such result in the coming days.
9
3,384
Yes there are definitely similarities
AI scientist and self-improving agents may be similar systems in practice?
3
1
6
1,333
Markus J. Buehler retweeted
Category theory is fun to think through. Cool paper!
We've made a breakthrough in self-evolving AI scientists moving from "search" to "principled discovery": Scientific discovery requires that the search space itself changes, and an AI scientist must perceive this shift without intervention. We built an AI that achieves this for the first time with the ability to discover the scientific vocabulary it reasons in. Evidence, tools, artifacts, verifiers, failures & claims become typed provenance. We show three distinct modalities: 1) retrieval, adding known objects; 2) search, exploring a fixed schema; and critically: 3) discovery, a verified regime transition. We solve the open-endedness evaluation problem by lifting agentic workflows into a typed copresheaf and proving, via a Kan obstruction, that true discovery is not unbounded generation but a verifiable schema expansion: old evidence is transported by Left Kan extension, and genuine novelty is mathematically quantified by the pointwise residual beyond the transported image - separating discovery from mere search and making novelty objective and measurable rather than a subjective judgment or benchmark delta. Our AI scientist is built in a way that does not pre-conceive the approach it chooses; instead, we endow the system with formal power to adapt, evolve, and reason from first principles. Case studies include: 1⃣Builder/Breaker model that discovers mode-conditioned compliance in proteins; 2⃣CategoryScienceClaw that finds anisotropic fiber-network stiffness rules. Great work in collaboration with my graduate student @fwang108_ @MITdeptofBE F.Y. Wang & M.J. Buehler, Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence, arXiv:2606.01444, 2026
2
3
21
3,691
Looking forward to joining this esteemed lineup of speakers to discuss advances in AI scientists, autonomous discovery and more. Thank you @WengongJin @AdaFang_ @YuanqiD for organizing this exciting event!
Thrilled to announce the AI Scientist Summer Workshop @AdaFang_ @YuanqiD 📅 Aug 4, 2026 📍Microsoft Research New England, Cambridge MA Join speakers from Anthropic, Google, Microsoft, AI2, Harvard, and MIT to discuss the future of AI Scientists! Register: ai-scientist-workshop.github…
29
3,939
Markus J. Buehler retweeted
We're grateful to @dair_ai for featuring our work at #1: Scientific discovery isn't generating answers inside a fixed space but a verified change of the space itself, formalized as a left Kan extension across regime transitions - congrats @fwang108_ and @ProfBuehlerMIT ⤵️
1
2
11
1,724
Thanks for featuring our work ⤵️
1
26
4,084
Markus J. Buehler retweeted
This was one of the standout AI papers of the week. (bookmark it) It tackles a question most self-improving AI agents ignore: is the agent actually discovering anything, or just remixing what it already knows? How can you tell whether the agent is doing real discovery or just confident retrieval? The authors give three clean buckets: - Retrieval is looking something up in a notebook you already have. - Search is combining tools you already own in new ways. - Discovery is inventing a new concept that wasn't in your toolkit before. The issue is that most agents stop at the first two. The math behind their definition (category theory plus a left Kan extension, if you care) is basically a bookkeeping trick to ask: could the old version of me have produced this result? If yes, it's not discovery. If no, something genuinely new showed up. They build a Builder/Breaker agent that studies protein mechanics. Over four rounds, the model's fit accuracy actually drops (R² goes from 0.48 to 0.68 to 0.54 to 0.41). At first glance, that looks like a failing agent. It isn't. The agent kept taking on harder proteins and rewriting its theory to cover them. Data grew almost 10x while the model code grew only 1.3x. A smaller theory covering a bigger world is exactly what good science looks like. Why does it matter? If you optimize for accuracy alone, your self-improving agent will just settle into easy benchmarks and stop. This paper offers a cleaner success signal and asks whether the agent is compressing more of the world into less code over time. Paper: arxiv.org/abs/2606.01444 Learn to build effective AI agents in our academy: academy.dair.ai/
29
68
346
40,347
Markus J. Buehler retweeted
Great paper on self-improving agents:
This was one of the standout AI papers of the week. (bookmark it) It tackles a question most self-improving AI agents ignore: is the agent actually discovering anything, or just remixing what it already knows? How can you tell whether the agent is doing real discovery or just confident retrieval? The authors give three clean buckets: - Retrieval is looking something up in a notebook you already have. - Search is combining tools you already own in new ways. - Discovery is inventing a new concept that wasn't in your toolkit before. The issue is that most agents stop at the first two. The math behind their definition (category theory plus a left Kan extension, if you care) is basically a bookkeeping trick to ask: could the old version of me have produced this result? If yes, it's not discovery. If no, something genuinely new showed up. They build a Builder/Breaker agent that studies protein mechanics. Over four rounds, the model's fit accuracy actually drops (R² goes from 0.48 to 0.68 to 0.54 to 0.41). At first glance, that looks like a failing agent. It isn't. The agent kept taking on harder proteins and rewriting its theory to cover them. Data grew almost 10x while the model code grew only 1.3x. A smaller theory covering a bigger world is exactly what good science looks like. Why does it matter? If you optimize for accuracy alone, your self-improving agent will just settle into easy benchmarks and stop. This paper offers a cleaner success signal and asks whether the agent is compressing more of the world into less code over time. Paper: arxiv.org/abs/2606.01444 Learn to build effective AI agents in our academy: academy.dair.ai/
4
25
137
18,664
Markus J. Buehler retweeted
Anthropic reports Claude now writes over 80% of its own production code — meaning an AI is the primary author of the systems training future versions of itself. Claude's research judgment matched human experts 22% of the time in 2024. Today it's 64%. The recursive loop has started.
182
190
1,723
86,010
Markus J. Buehler retweeted
Most AI today searches within a predefined map. This paper argues that true scientific discovery begins when the map itself changes. Not better retrieval. Not deeper search. The ability to create new concepts, new structures, and new ways of reasoning. That's a very different path toward intelligence. arxiv.org/pdf/2606.01444
We've made a breakthrough in self-evolving AI scientists moving from "search" to "principled discovery": Scientific discovery requires that the search space itself changes, and an AI scientist must perceive this shift without intervention. We built an AI that achieves this for the first time with the ability to discover the scientific vocabulary it reasons in. Evidence, tools, artifacts, verifiers, failures & claims become typed provenance. We show three distinct modalities: 1) retrieval, adding known objects; 2) search, exploring a fixed schema; and critically: 3) discovery, a verified regime transition. We solve the open-endedness evaluation problem by lifting agentic workflows into a typed copresheaf and proving, via a Kan obstruction, that true discovery is not unbounded generation but a verifiable schema expansion: old evidence is transported by Left Kan extension, and genuine novelty is mathematically quantified by the pointwise residual beyond the transported image - separating discovery from mere search and making novelty objective and measurable rather than a subjective judgment or benchmark delta. Our AI scientist is built in a way that does not pre-conceive the approach it chooses; instead, we endow the system with formal power to adapt, evolve, and reason from first principles. Case studies include: 1⃣Builder/Breaker model that discovers mode-conditioned compliance in proteins; 2⃣CategoryScienceClaw that finds anisotropic fiber-network stiffness rules. Great work in collaboration with my graduate student @fwang108_ @MITdeptofBE F.Y. Wang & M.J. Buehler, Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence, arXiv:2606.01444, 2026
2
1
15
1,333
Markus J. Buehler retweeted
Replying to @ProfBuehlerMIT
The jump from searching a fixed schema to actually expanding the schema itself is the part that makes this genuinely different. The fact that novelty is now mathematically quantifiable rather than a benchmark delta is a real step forward.
1
1
31
5,734