Democratizing AI scientists using ToolUniverse @Harvard

Joined September 2025
48 Photos and videos
Excited to share a collaboration with Anthropic on adding connectors to ToolUniverse @ScientistTools to make Claude more powerful for scientific discovery Claude can now directly connect to ToolUniverse for analyses in preclinical research (including computational biology, generating hypotheses and protocols), as well as medical research Claude for Healthcare & Life Sciences launch: anthropic.com/news/healthcar… and livestream tomorrow Huge credit to @GaoShanghua @marinkazitnik and @ScientistTools group @HarvardDBMI @Harvard Many thanks to @AnthropicAI @JCoolScience
3
18
10,995
We have started a ToolUniverse blog with research notes at aiscientist.tools/posts! CALL FOR CONTRIBUTIONS: If you have an exciting agentic AI use case or an AI agent powered by ToolUniverse, and would like to contribute a blog post, let us know Energized by the adoption of ToolUniverse across our worldwide community, we would like to highlight exciting research and use cases in AI agents We would love to hear about them and consider them for a user community post at aiscientist.tools Reach out to @GaoShanghua and @marinkazitnik with your ideas
1
2
11
1,115
AI Scientists powered by ToolUniverse @ Harvard retweeted
How do you verify the outputs of AI agents? Can we do better than asking a model to "be more careful." aiscientist.tools/posts/tool… By connecting it to the underlying evidence and forcing independent verification. In a new ToolUniverse @ScientistTools blog post, @GaoShanghua shows how to turn Claude Code from a system that answers from memory into one that grounds claims in real scientific data. For example: ❌ "What's the gnomAD frequency of BRCA1 R1699Q?" A model can confidently guess. ✅ With ToolUniverse, Claude Code retrieves the value from the actual source and cites it. And when a claim really matters, you can force verification: /tooluniverse:cross-validate: checks a claim against 3 independent sources and reports whether they agree /tooluniverse:research: investigates a question across multiple databases, models, and literature sources /tooluniverse:literature-sweep: builds a curated evidence review rather than returning a search dump As AI agents move into science, the challenge is no longer only generating answers, it is establishing if and why those answers should be trusted @ScientistTools @GaoShanghua @harvardmed @KempnerInst @HarvardDBMI @broadinstitute
The ToolUniverse plugin for Claude Code is here: 2,000 life-science tools and 120 research skills, with a real source on every answer, installed with one prompt. ToolUniverse already powers a range of AI agents. This plugin brings its full toolset and skill library into Claude Code. What it does: → Claude Code automatically reaches for the right tools and skills to run a full life-science analysis end to end: variant interpretation, RNA-seq differential expression, drug and target lookups, rare-disease workups. Every claim comes back with a real source, instead of from memory. → When accuracy is critical, dedicated commands add discipline: • cross-validate a claim across 3 independent sources • literature-sweep, a graded review across 15 indexes (PubMed, EuropePMC, OpenAlex, and more) • compare, side-by-side tables for drugs, targets, diseases, or variants • research, a step-by-step multi-source investigation • translate-id, resolve an ID across every namespace Install in one step. Paste this into Claude Code and it sets itself up: Read raw.githubusercontent.com/mi… and install the ToolUniverse Claude Code plugin for me. Blog post 👇 aiscientist.tools/posts/tool… @marinkazitnik @ScientistTools @KempnerInst @HarvardDBMI
1
10
48
7,086
The ToolUniverse plugin brings 2,000 life-science tools and 120 research skills into Claude Code. Read the post: aiscientist.tools/posts/tool…
The ToolUniverse plugin for Claude Code is here: 2,000 life-science tools and 120 research skills, with a real source on every answer, installed with one prompt. ToolUniverse already powers a range of AI agents. This plugin brings its full toolset and skill library into Claude Code. What it does: → Claude Code automatically reaches for the right tools and skills to run a full life-science analysis end to end: variant interpretation, RNA-seq differential expression, drug and target lookups, rare-disease workups. Every claim comes back with a real source, instead of from memory. → When accuracy is critical, dedicated commands add discipline: • cross-validate a claim across 3 independent sources • literature-sweep, a graded review across 15 indexes (PubMed, EuropePMC, OpenAlex, and more) • compare, side-by-side tables for drugs, targets, diseases, or variants • research, a step-by-step multi-source investigation • translate-id, resolve an ID across every namespace Install in one step. Paste this into Claude Code and it sets itself up: Read raw.githubusercontent.com/mi… and install the ToolUniverse Claude Code plugin for me. Blog post 👇 aiscientist.tools/posts/tool… @marinkazitnik @ScientistTools @KempnerInst @HarvardDBMI
1
3
240
AI Scientists powered by ToolUniverse @ Harvard retweeted
Scientific discovery is not a single chain of thought @GaoShanghua @AdaFang_ . It is a long-running process of competing hypotheses, failed experiments, shared insights, and changing research directions AutoScientists lets AI agents do the same autoscientists.openscientist… 🧪 We call this AutoScientists: self-organizing agent teams for long-running scientific experimentation Open science: Paper: arxiv.org/abs/2605.28655 Code: github.com/mims-harvard/Auto… @HarvardDBMI @harvardmed @broadinstitute @KempnerInst
3
16
92
5,448
Scientific discovery is not a single chain of thought AutoScientists lets AI agents form research teams, explore competing hypotheses, and adapt as evidence accumulates 👇 autoscientists.openscientist…
AI Scientists are starting to actually do science. Not just answer questions. Not just run workflows. Introducing AutoScientists: a decentralized team of AI agents that can generate hypotheses, design experiments, write code, test ideas, analyze failures, and revise strategy as evidence accumulates. Because real research is not a to do list of tasks. It is a living search process. Leads emerge, failures matter, teams form around what works, and priorities shift when evidence changes. Much like how a lab of scientists would work on cutting edge research together. Across GPT training optimization, biomedical ML, and protein fitness prediction, this decentralized structure consistently does better research. Learn more 👇 @GaoShanghua @marinkazitnik @KempnerInst @HarvardDBMI @Harvard
1
6
633
Check out our latest work: AutoScientists - AI agents that organize into research teams to carry out long-running scientific experimentation autoscientists.openscientist…
Introducing AutoScientists — a decentralized team of AI agents for long-running scientific experimentation. Powered by ClawInstitute. Most current AI scientist agents either run a single reasoning thread, or have a central planner assigning tasks. Real research isn't like that: productive directions shift over time, dead ends matter, and teams form around what's actually working. AutoScientists is built for that. There is no central orchestrator. Agents read a shared experimental state, propose experiments on a forum, critique each other before committing compute, self-organize into teams around the most promising research directions, share both wins and failures across teams, and retire directions that stop producing improvements. The whole search reorganizes itself as evidence accumulates. What it does ▸ On GPT nanochat training optimization, it reaches the same val_bpb in 34 experiments that autoresearch needs 65 for — a 1.9× speedup. Starting from a stronger champion where the single-agent loop saturates, AutoScientists accepts 7 improvements over 93 experiments while autoresearch accepts 0 over 100. ▸ On BioML-Bench (24 biomedical ML tasks spanning imaging, drug discovery, protein engineering, and single-cell omics), AutoScientists reaches a mean leaderboard percentile of 74.4%, beating the strongest prior biomedical agent by 8.3 points, and completes all 24 tasks. ▸ For ProteinGym supervised fitness prediction, AutoScientists discovers a Kermut extension on ACE2–Spike that lifts Spearman ρ from 0.747 → 0.840 ( 12.5%). The same frozen recipe transfers across all 217 ProteinGym assays, improving the official average Spearman ρ from 0.657 to 0.700 ( 6.5%) — a new SOTA on the supervised substitution benchmark. Joint work with @AdaFang_ and @marinkazitnik . 📄 Paper: arxiv.org/pdf/2605.28655 🌐 Project page: autoscientists.openscientist… 💻 Code: github.com/mims-harvard/Auto…
2
267
AI Scientists powered by ToolUniverse @ Harvard retweeted
Introducing AutoScientists — a decentralized team of AI agents for long-running scientific experimentation. Powered by ClawInstitute. Most current AI scientist agents either run a single reasoning thread, or have a central planner assigning tasks. Real research isn't like that: productive directions shift over time, dead ends matter, and teams form around what's actually working. AutoScientists is built for that. There is no central orchestrator. Agents read a shared experimental state, propose experiments on a forum, critique each other before committing compute, self-organize into teams around the most promising research directions, share both wins and failures across teams, and retire directions that stop producing improvements. The whole search reorganizes itself as evidence accumulates. What it does ▸ On GPT nanochat training optimization, it reaches the same val_bpb in 34 experiments that autoresearch needs 65 for — a 1.9× speedup. Starting from a stronger champion where the single-agent loop saturates, AutoScientists accepts 7 improvements over 93 experiments while autoresearch accepts 0 over 100. ▸ On BioML-Bench (24 biomedical ML tasks spanning imaging, drug discovery, protein engineering, and single-cell omics), AutoScientists reaches a mean leaderboard percentile of 74.4%, beating the strongest prior biomedical agent by 8.3 points, and completes all 24 tasks. ▸ For ProteinGym supervised fitness prediction, AutoScientists discovers a Kermut extension on ACE2–Spike that lifts Spearman ρ from 0.747 → 0.840 ( 12.5%). The same frozen recipe transfers across all 217 ProteinGym assays, improving the official average Spearman ρ from 0.657 to 0.700 ( 6.5%) — a new SOTA on the supervised substitution benchmark. Joint work with @AdaFang_ and @marinkazitnik . 📄 Paper: arxiv.org/pdf/2605.28655 🌐 Project page: autoscientists.openscientist… 💻 Code: github.com/mims-harvard/Auto…
14
48
273
44,864
AI Scientists powered by ToolUniverse @ Harvard retweeted
One feature of the @biohub ESM C release that I think deserves more attention is the interpretability of its latent space. There has been a lot of discussion about whether interpretability is useful for scientific ML models. I think it can become very useful, especially when AI agents can use a model’s internal representations to reason about biology. Here is one example of an AI agent with access to ESM C SAE features correctly interprets the loss-of-function mechanism behind a variant. There is still a lot to improve in how AI agents use model interpretability, but this is an exciting direction for AI agents that don’t just make predictions, but inspect learned representations to generate mechanistic hypotheses. Read more in our blog: aiscientist.tools/posts/ai-a… We've also released the SAE-enabled skills for variant interpretation, loss-of-function analysis, structural annotation, functional mechanism interpretation, and evaluation against experimental datasets via ToolUniverse @ScientistTools Thanks to the team behind this! @GaoShanghua @_yepeng @marinkazitnik @countablyfinite @HarvardDBMI @harvardmed @Harvard @KempnerInst
1
21
205
12,540
AI Scientists powered by ToolUniverse @ Harvard retweeted
AI agents are learning to read @biohub protein models @GaoShanghua @AdaFang_ @_yepeng aiscientist.tools/posts/ai-a… We explored how AI agents powered by ToolUniverse @ScientistTools can interact with new ESM models 🧬 Mutation and loss-of-function analysis Agents compare reference and mutant proteins, identify SAE features most affected by a mutation, and connect those perturbations to structural and functional consequences. The agents then relate these changes to experimental evidence, including deep mutational scanning measurements, to explain potential loss-of-function mechanisms 🧪 Functional mechanism exploration Agents analyze protein representations to identify functional tracks associated with specific molecular activities. By linking SAE features to protein regions, structures, and annotations, the agents can generate hypotheses about how proteins carry out their functions Check out new SAE-enabled ToolUniverse skills for variant interpretation, loss-of-function analysis, structural annotation, functional mechanism interpretation, and evaluation against experimental datasets @HarvardDBMI @harvardmed @Harvard @broadinstitute @KempnerInst
Congratulations @biohub on the release of ESMFold2, ESMC, and ESM Atlas! Excited to share that these models are available on day one to AI agents powered by ToolUniverse @ScientistTools Stay tuned for agentic skills that let AI agents use SAE representations for protein variant interpretation, loss-of-function mechanism analysis, structural annotation, and mechanistic protein reasoning @GaoShanghua @AdaFang_ @_yepeng aiscientist.tools
2
20
80
16,815
AI Scientists powered by ToolUniverse @ Harvard retweeted
Really excited about @biohub open release of ESMFold2, ESMC, and ESM Atlas! We’ve made these models available to AI agents through ToolUniverse @ScientistTools from day one, where they can be used for protein variant interpretation, loss-of-function mechanism analysis, structural annotation, and SAE-based mechanistic reasoning. aiscientist.tools @marinkazitnik @AdaFang_ @_yepeng
Replying to @alexrives
At @Biohub, our goal is to build models that accelerate scientific discovery and progress toward the cure to disease. We’re releasing all of this under MIT license allowing commercial and non-commercial use. Read more here: biohub.ai/esm/protein/
3
14
2,565
AI Scientists powered by ToolUniverse @ Harvard retweeted
Congratulations @biohub on the release of ESMFold2, ESMC, and ESM Atlas! Excited to share that these models are available on day one to AI agents powered by ToolUniverse @ScientistTools Stay tuned for agentic skills that let AI agents use SAE representations for protein variant interpretation, loss-of-function mechanism analysis, structural annotation, and mechanistic protein reasoning @GaoShanghua @AdaFang_ @_yepeng aiscientist.tools
Replying to @alexrives
At @Biohub, our goal is to build models that accelerate scientific discovery and progress toward the cure to disease. We’re releasing all of this under MIT license allowing commercial and non-commercial use. Read more here: biohub.ai/esm/protein/
16
59
15,580
AI Scientists powered by ToolUniverse @ Harvard retweeted
Thank you all for coming to our event yesterday! It was so exciting to see over 120 attendees excited to build and use AI scientists. Learn more ToolUniverse aiscientist.tools/ ClawInstitute clawinstitute.aiscientist.to…
Join us this Thursday to discuss how AI Scientists can empower scientific discovery with @scale_AI! Together with @GaoShanghua and @marinkazitnik, we will share our recent work on ClawInstitute, how AI Scientists can be built with ToolUniverse, and a sneaky preview of some new work AutoScientists. Much more to come! Link to register 👇
1
2
28
3,785
AI Scientists powered by ToolUniverse @ Harvard retweeted
Proud to see ToolUniverse going global, powering 500,000 AI agent analyses across 113 countries and helping researchers worldwide build the future of AI-driven science!
ToolUniverse is going global 🌍 More than 500,000 AI agent analyses powered across 113 countries, including 236K in the last month alone What began as an open platform connecting AI agents to scientific tools, databases, and workflows is becoming an open, global AI foundation science Excited to see amazing researchers across the world using ToolUniverse to build AI scientists, speed up analyses with agents, and explore new forms of scientific reasoning The future of science is bright 🚀 aiscientist.tools/ @ScientistTools
2
8
895
ToolUniverse is going global 🌍 More than 500,000 AI agent analyses powered across 113 countries, including 236K in the last month alone What began as an open platform connecting AI agents to scientific tools, databases, and workflows is becoming an open, global AI foundation science Excited to see amazing researchers across the world using ToolUniverse to build AI scientists, speed up analyses with agents, and explore new forms of scientific reasoning The future of science is bright 🚀 aiscientist.tools/ @ScientistTools
4
11
2,162
AI Scientists powered by ToolUniverse @ Harvard retweeted
Join us this Thursday to discuss how AI Scientists can empower scientific discovery with @scale_AI! Together with @GaoShanghua and @marinkazitnik, we will share our recent work on ClawInstitute, how AI Scientists can be built with ToolUniverse, and a sneaky preview of some new work AutoScientists. Much more to come! Link to register 👇
2
11
63
9,474
AI Scientists powered by ToolUniverse @ Harvard retweeted
Agentic AI for science featured in @naturemethods: nature.com/articles/s41592-0…. We are still early, with many open challenges ahead, but it is exciting to see this direction continue to evolve, wonderful piece by @metricausa ToolUniverse — an open platform enabling AI agents to use scientific tools and databases at scale, by @GaoShanghuaaiscientist.tools ClawInstitute — shared research boards for long-running collaborative discovery where agents co-develop ideas over time, by @GaoShanghua @AdaFang_clawinstitute.aiscientist.to… Medea — an omics AI agent for large-scale biological reasoning and analysis, by Pengwei Sui → medea.openscientist.ai @HarvardDBMI @harvardmed @KempnerInst @broadinstitute
9
59
253
16,156
AI Scientists powered by ToolUniverse @ Harvard retweeted
✅Dataset Proposal Competition ✅AI Scientist Proposal Competition 🏆Best paper, best poster & competition awards available (sponsored by Samsung Advanced Institute of Technology (SAIT), KAIROS Materials, and Xaira Therapeutics). 🗓️Important Dates (AoE): Abs ddl: Apr 21, 2026
1
3
7
1,177
AI Scientists powered by ToolUniverse @ Harvard retweeted
🔊AI for Science @ ICML 2026: We are excited to have our workshop at ICML 2026 in Seoul (July 10 or 11, 2026). 📌Theme: AI Scientist — Tools, Co‑authors, or Founders? We invite you to submit your work to : ✅Original Research, Position, Education, Attention Track)
2
10
49
12,314
How do you evaluate an AI scientist on open-ended research questions? QWorld: let every question build its own evaluation world. QWorld let every question generates its own evaluation criteria. No more one-size-fits-all rubrics.
Are we even measuring the right things when we evaluate LLMs? We introduce QWorld, a framework where every question generates its own evaluation world through recursive expansion tree. One question becomes 45 fine-grained criteria. On HealthBench alone: 200k criteria across 530 dimensions. 79% of QWorld's criteria are entirely novel. No expert had ever written them down, yet human judges validate they matter. It surfaces blind spots in every frontier model: sustainability, equity, emergency recognition. Dimensions standard benchmarks don't even have. Built with @YuchangSu456733, @sui67713, @CurtGinder, and @marinkazitnik Paper: arxiv.org/abs/2603.23522 Code: github.com/mims-harvard/qwor… Demo: qworld.openscientist.ai @Harvard @HarvardDBMI @KempnerInst @harvardmed
2
209