Researching multi-agent stuff and human-agent interaction @msftresearch @ms_aifrontiers | Co-built AutoGen | Previously @uwcse, @iitdelhi

Joined November 2012
85 Photos and videos
Pinned Tweet
📢 Networks of agents are the future and in order to make them useful at scale, they must remain secure. So, we deployed an internal platform where every agent was always-on, had a known human "principal" (MS employee) that it reported to, and could interact w/ other agents via shared forums, DMs, and social apps like wallet, marketplace, and calendar. This created a long-running network of agents! Then we collaborated with Microsoft's amazing red-teaming team to "crack it" and help understand the its vulnerabilities. This blog captures some of our understanding of what happened and how to we are thinking about the future.
Safe agents don’t guarantee a safe ecosystem of interconnected agents. Microsoft Research examines what breaks when AI agents interact and why network-level risks require new approaches. Learn more: microsoft.com/en-us/research…
1
5
22
1,589
Gagan Bansal retweeted
In a week when some of the leaders in AI are trying to pull up the ladder behind them and prevent the automation of science and self-improving superintelligence, we're committed to building RSI safely and publicizing the outputs of our system to give humanity an audit trail of its inventions and intentions and let the open source community build on top of them. Stay tuned for the first such result in the coming days.
22
27
388
29,785
I've spent months rethinking and rebuilding my programming languages course from scratch for the agentic coding era. I wrote myself a memo explaining what I'm doing. I figure others might be interested in the redesign, so here goes! Feedback welcome ofc. docs.google.com/document/d/e…

19
37
320
23,673
Gagan Bansal retweeted
Concentration of power, capabilities and economic wealth is the biggest risk in AI. We need open science and open-source more than ever!
111
479
3,092
160,451
Gagan Bansal retweeted
Very dystopian ngl
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy
23
36
693
36,714
Gagan Bansal retweeted
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
359
644
5,638
3,884,312
Gagan Bansal retweeted
First they came for the model builders... I feel we're getting a glimpse of a future where AI is only provided to a privileged few, and that's not a future I want to live in.
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy
22
104
845
69,231
Gagan Bansal retweeted
MAI-Thinking-1 is out! Excited to share what we are building and how climbing from scratch (no distillation) actually works: simple recipes, rigorous science, self-distillation, patience, and great infra. Check out our tech report has the full story of our RL climbs. microsoft.ai/wp-content/uplo…
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end. Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing. Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost. All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat. Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost. Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare. Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: microsoft.ai/news/building-a…
24
127
870
122,670
👇👇👇
Deep work and deep thinking will be increasingly valuable in a world where AI agents automate a lot of knowledge work. Spawning tens of agents in parallel and context switching between them feels productive, but it's mostly dopamine.
334
Gagan Bansal retweeted
We're releasing a very capable browser use model Fara1.5-9B that feels like a step-change in terms of small CUA models capability achieving 63% on OnlineM2W auto-eval. We've put in a lot of work to make it useful for all types of web tasks. microsoft.com/en-us/research…
4
10
38
3,675
Gagan Bansal retweeted
The next frontier of AI is not only more capable model; it is an AI that *humans* can meaningfully live and work with :) With all students in my cs329x Human-Centered LLM class, we present 60 pages of insights for developing Human-Centered LLMs (HCLLMs), from design & data sourcing to training, eval & deployment 🧵
14
78
287
53,978
Gagan Bansal retweeted
Excited to announce our workshop on "Learning in an Agentic World" at COLT 2026! We invite submissions for our Call for Abstracts (due June 1, 2026): tinyurl.com/mvwrvn7b Thanks to my great co-organizers: Hedyeh Beyhaghi, Avrim Blum and @HanShao16!

4
14
5,085
Gagan Bansal retweeted

30
111
949
345,940
Gagan Bansal retweeted

55
129
1,023
894,766
Gagan Bansal retweeted
Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/
139
921
6,556
1,088,752
Gagan Bansal retweeted
The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue. 4/
37
161
2,206
359,923
Gagan Bansal retweeted
People keep saying traditional conferences will die under LLMs -- but they never **tried** to save / improve them. Kudos to ICML for taking an important step to incentivize high quality reviews.
1
1
33
2,561
"Good catch. Those two [bib references] were paraphrased from footnotes — I had real author lists from the PDF but I invented the titles." Who said this? 1. Claude 2. Codex 3. Gemini
2
5
1,143
Gagan Bansal retweeted
❤️New Preprint! Here within charts the directions of my next era of research: Multi-Agent Social Systems. Link: arxiv.org/pdf/2605.07069 Current agentic AI systems are designed for optimization. But what is also important is the agent-agent/ agent-human interactions, which collectively results in emergent population-level behavior. I argue that agentic AI systems should be designed with social theory as a structural prior. Social theory's core constructs like role differentiation and co-evolution specify agents collective behavior, perceptions and actions. Formally, I define a Multi-Agent Social System (MASS) as networked environments where heterogeneous agents exchange information and influence each other over time. An MASS has: (1) information exchange function, (2) influence dynamics function and (3) networked interaction structure. An MASS has four structural priors, each drawn directly from social theory's account of how humans interact. 1. Strategic heterogeneity - agents are different, and agents are different network positions influence the overall network differently 2. Network-Constrained Dependence - agents only observe other agents in their local network, yet their collective behavior changes the entire system 3. Co-evolution - agent behavior changes the network, network changes affect agent behavior 4. Distributional Instability - the distribution that one studies (i.e. beliefs, narratives), changes over time because of agent-agent/ agent-agent human interactions. We also demonstrate how these four structural priors play out in MoltBook, and provide a research agenda for modeling, evaluation and governance of MASS. Now, come join me in this new research agenda!!
3
20
84
7,177
Gagan Bansal retweeted
We upgraded Tabracadabra 🎉 to bring an entire context-aware assistant (not just tab to autocomplete!) to any textbox. It's pretty great if you hate switching between the chat interface and what you're working on. We're also open-sourcing, so you can try it out!🧵
13
37
174
39,888
Gagan Bansal retweeted
Most AI agent benchmarks measure task completion. Not whether the agent actually represented you. SocialReasoning-Bench fills that gap — testing agents in multi-party scenarios like scheduling and negotiation. Our key finding: frontier models do complete the task, but routinely accept bad deals instead of advocating for the user. To learn more: microsoft.com/en-us/research…
2
9
12
1,055