Gagan Bansal

Gagan Bansal

85 Photos and videos

Tweets

Pinned Tweet

Gagan Bansal @bansalg_

May 1

📢 Networks of agents are the future and in order to make them useful at scale, they must remain secure. So, we deployed an internal platform where every agent was always-on, had a known human "principal" (MS employee) that it reported to, and could interact w/ other agents via shared forums, DMs, and social apps like wallet, marketplace, and calendar. This created a long-running network of agents! Then we collaborated with Microsoft's amazing red-teaming team to "crack it" and help understand the its vulnerabilities. This blog captures some of our understanding of what happened and how to we are thinking about the future.

Microsoft Research

@MSFTResearch

Apr 30

Safe agents don’t guarantee a safe ecosystem of interconnected agents. Microsoft Research examines what breaks when AI agents interact and why network-level risks require new approaches. Learn more: microsoft.com/en-us/research…

0:20

1,589

Richard Socher

Gagan Bansal retweeted

Richard Socher

@RichardSocher

Jun 10

In a week when some of the leaders in AI are trying to pull up the ladder behind them and prevent the automation of science and self-improving superintelligence, we're committed to building RSI safely and publicizing the outputs of our system to give humanity an audit trail of its inventions and intentions and let the open source community build on top of them. Stay tuned for the first such result in the coming days.

388

29,785

Shriram Krishnamurthi (primary: Bluesky)

Gagan Bansal retweeted

Shriram Krishnamurthi (primary: Bluesky)@ShriramKMurthi

Jun 9

I've spent months rethinking and rebuilding my programming languages course from scratch for the agentic coding era. I wrote myself a memo explaining what I'm doing. I figure others might be interested in the redesign, so here goes! Feedback welcome ofc. docs.google.com/document/d/e…

320

23,673

clem 🤗

Gagan Bansal retweeted

clem 🤗

@ClementDelangue

Jun 10

Concentration of power, capabilities and economic wealth is the biggest risk in AI. We need open science and open-source more than ever!

111

479

3,092

160,451

Julien Chaumond

Gagan Bansal retweeted

Julien Chaumond

@julien_c

Jun 9

Very dystopian ngl

elie

@eliebakouch

Jun 9

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

693

36,714

elie

Gagan Bansal retweeted

elie

@eliebakouch

Jun 9

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

Claude

@claudeai

Jun 9

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

0:20

359

644

5,638

3,884,312

Graham Neubig

Gagan Bansal retweeted

Graham Neubig

@gneubig

Jun 9

First they came for the model builders... I feel we're getting a glimpse of a future where AI is only provided to a privileged few, and that's not a future I want to live in.

elie

@eliebakouch

Jun 9

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

104

845

69,231

Hanna Hajishirzi

Gagan Bansal retweeted

Hanna Hajishirzi

@HannaHajishirzi

Jun 2

MAI-Thinking-1 is out! Excited to share what we are building and how climbing from scratch (no distillation) actually works: simple recipes, rigorous science, self-distillation, patience, and great infra. Check out our tech report has the full story of our RL climbs. microsoft.ai/wp-content/uplo…

Mustafa Suleyman

@mustafasuleyman

Jun 2

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end. Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing. Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost. All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat. Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost. Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare. Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: microsoft.ai/news/building-a…

127

870

122,670

Gagan Bansal

Gagan Bansal @bansalg_

May 22

👇👇👇

Karthik Narasimhan

@karthik_r_n

May 21

Deep work and deep thinking will be increasingly valuable in a world where AI agents automate a lot of knowledge work. Spawning tens of agents in parallel and context switching between them feels productive, but it's mostly dopamine.

334

Hussein Mozannar

Gagan Bansal retweeted

Hussein Mozannar

@HsseinMzannar

May 21

We're releasing a very capable browser use model Fara1.5-9B that feels like a step-change in terms of small CUA models capability achieving 63% on OnlineM2W auto-eval. We've put in a lot of work to make it useful for all types of web tasks. microsoft.com/en-us/research…

3,675

Diyi Yang

Gagan Bansal retweeted

Diyi Yang

@Diyi_Yang

May 20

The next frontier of AI is not only more capable model; it is an AI that *humans* can meaningfully live and work with :) With all students in my cs329x Human-Centered LLM class, we present 60 pages of insights for developing Human-Centered LLMs (HCLLMs), from design & data sourcing to training, eval & deployment 🧵

287

53,978

Dravyansh Sharma

Gagan Bansal retweeted

Dravyansh Sharma @DravyanshSharma

May 18

Excited to announce our workshop on "Learning in an Agentic World" at COLT 2026! We invite submissions for our Call for Abstracts (due June 1, 2026): tinyurl.com/mvwrvn7b Thanks to my great co-organizers: Hedyeh Beyhaghi, Avrim Blum and @HanShao16!

5,085

Mike Piccolo

Gagan Bansal retweeted

Mike Piccolo

@mfpiccolo

Apr 28

x.com/i/article/204912686079…

111

949

345,940

Dimitris Papailiopoulos

Gagan Bansal retweeted

Dimitris Papailiopoulos

@DimitrisPapail

May 18

x.com/i/article/205634415123…

129

1,023

894,766

Thomas G. Dietterich

Gagan Bansal retweeted

Thomas G. Dietterich @tdietterich

May 14

Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/

139

921

6,556

1,088,752

Thomas G. Dietterich

Gagan Bansal retweeted

Thomas G. Dietterich @tdietterich

May 14

The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue. 4/

161

2,206

359,923

Allen Nie (🇺🇦☮️)

Gagan Bansal retweeted

Allen Nie (🇺🇦☮️)

@allenainie

May 13

People keep saying traditional conferences will die under LLMs -- but they never **tried** to save / improve them. Kudos to ICML for taking an important step to incentivize high quality reviews.

2,561

Gagan Bansal

Gagan Bansal @bansalg_

May 13

"Good catch. Those two [bib references] were paraphrased from footnotes — I had real author lists from the PDF but I invented the titles." Who said this? 1. Claude 2. Codex 3. Gemini

1,143

lynnette ng

Gagan Bansal retweeted

lynnette ng

@quarbby

May 12

❤️New Preprint! Here within charts the directions of my next era of research: Multi-Agent Social Systems. Link: arxiv.org/pdf/2605.07069 Current agentic AI systems are designed for optimization. But what is also important is the agent-agent/ agent-human interactions, which collectively results in emergent population-level behavior. I argue that agentic AI systems should be designed with social theory as a structural prior. Social theory's core constructs like role differentiation and co-evolution specify agents collective behavior, perceptions and actions. Formally, I define a Multi-Agent Social System (MASS) as networked environments where heterogeneous agents exchange information and influence each other over time. An MASS has: (1) information exchange function, (2) influence dynamics function and (3) networked interaction structure. An MASS has four structural priors, each drawn directly from social theory's account of how humans interact. 1. Strategic heterogeneity - agents are different, and agents are different network positions influence the overall network differently 2. Network-Constrained Dependence - agents only observe other agents in their local network, yet their collective behavior changes the entire system 3. Co-evolution - agent behavior changes the network, network changes affect agent behavior 4. Distributional Instability - the distribution that one studies (i.e. beliefs, narratives), changes over time because of agent-agent/ agent-agent human interactions. We also demonstrate how these four structural priors play out in MoltBook, and provide a research agenda for modeling, evaluation and governance of MASS. Now, come join me in this new research agenda!!

7,177

Omar Shaikh

Gagan Bansal retweeted

Omar Shaikh @oshaikh13

May 13

We upgraded Tabracadabra 🎉 to bring an entire context-aware assistant (not just tab to autocomplete!) to any textbox. It's pretty great if you hate switching between the chat interface and what you're working on. We're also open-sourcing, so you can try it out!🧵

0:42

174

39,888

Microsoft AI Frontiers

Gagan Bansal retweeted

Microsoft AI Frontiers

@ms_aifrontiers

May 11

Most AI agent benchmarks measure task completion. Not whether the agent actually represented you. SocialReasoning-Bench fills that gap — testing agents in multi-party scenarios like scheduling and negotiation. Our key finding: frontier models do complete the task, but routinely accept bad deals instead of advocating for the user. To learn more: microsoft.com/en-us/research…

SocialReasoning Bench shows the limits of today’s AI agents

Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize...

microsoft.com

1,055