stealth // ex Gemini RL Inference @GoogleDeepMind // Chat AI @AIatMeta // RL Agents @EA // ML Information Theory @MIT @Harvard @GeorgiaTech

Joined December 2018
221 Photos and videos
Pinned Tweet
We are hiring Members of Technical Staff (Research Engineers)! Current LLM agents lack reliability, creating a gap between demos and production. We solve this by automating the complex workflow of debugging, evaluation, and iteration required to make agents robust. 👇
25
39
648
82,347
This is the kind of drama you’d expect to watch in a corporate thriller series
NEW: Inside the 24-hrs before WH slapped export controls on Anthropic - Last Thursday, Amazon CEO Andy Jassy raised concerns about Fable jailbreak to Trump admin - Friday AM, Sean Cairncross, Bessent, Susie etc. held WH call to discuss - Then White House started reaching out to Anthropic to speak with Dario Amodei, who was at a wellness retreat. - When Amodei was finally available past 1pm, he had three tense phone calls with a combo of ppl including Cairncross, Bessent, Lutnick, Kessler, Will Scharf, Richard Walters, and Walker Barrett. -Amodei tried to clear up what he assumed was a misunderstanding. He defended the guardrails and distinguished between universal and non-universal jailbreak - Cairncross and Bessent were unmoved and asked Amodei to take down Fable and work with the admin to fix the vulnerabilities. (A WH official said Amazon’s findings were run past the NSA and they felt they had “proof.”) - Amodei asked for more time and info, but he made no commitments to pull the model - Bessent told Amodei directly at one point that he was making a “bad decision” - By Friday evening, the Trump admin imposed its export controls. - “Export controls were a last resort after begging them for hours to work with us,” senior WH official said. W/ @cheyennehaslett politico.com/news/2026/06/13…
5
1,763
💯
We did research when pay was low. We did research when pay was uncertain. We did research even when we were lucky enough to be paid well. One way is to figure out what to work on is to work on things that matter and not think of rewards. We are still quite early into what makes a frontier model all the way from optimization, architecture and objectives. Big token wants to convince you otherwise.
1
6
2,322
Ahmad Beirami retweeted
We did research when pay was low. We did research when pay was uncertain. We did research even when we were lucky enough to be paid well. One way is to figure out what to work on is to work on things that matter and not think of rewards. We are still quite early into what makes a frontier model all the way from optimization, architecture and objectives. Big token wants to convince you otherwise.
May 16
The vibes in SF feel pretty frenetic right now. The divide in outcomes is the worst I've ever seen. Over the last 5yrs, a group of ~10k people - employees at Anthropic, OpenAI, xAI, Nvidia, Meta TBD, founders - have hit retirement wealth of well above $20M (back of the envelope AI estimation). Everyone outside that group feels like they can work their well-paying (but <$500k) job for their whole life and never get there. Worse yet, layoffs are in full swing. Many software engineers feel like their life's skill is no longer useful. The day to day role of most jobs has changed overnight with AI. As a result, 1. The corporate ladder looks like the wrong building to climb. Everyone's trying to align with a new set of career "paths": should I be a founder? Is it too late to join Anthropic / OpenAI? should I get into AI? what company stock will 10x next? People are demanding higher salaries and switching jobs more and more. 2. There’s a deep malaise about work (and its future). Why even work at all for “peanuts”? Will my job even exist in a few years? Many feel helpless. You hear the “permanent underclass” conversation a lot, esp from young people. It's hard to focus on doing good work when you think "man, if I joined Anthropic 2yrs ago, I could retire" 3. The mid to late middle managers feel paralyzed. Many have families and don't feel like they have the energy or network to just "start a company". They don't particularly have any AI skills. They see the writing on the wall: middle management is being hollowed out in many companies. 4. The rich aren’t particularly happy either. No one is shedding tears for them (and rightfully so). But those who have "made it" experience a profound lack of purpose too. Some have gone from <$150k to >$50M in a few years with no ramp. It flips your life plans upside down. For some, comparison is the thief of joy. For some, they escape to NYC to "live life". For others still, they start companies "just cuz", often to win status points. They never imagined that by age 30, they'd be set. I once asked a post-economic founder friend why they didn't just sell the co and they said "and do what? right now, everyone wants to talk to me. if i sell, I will only have money." I understand that many reading this scoff at the champagne problems of the valley. Society is warped in this tech bubble. What is often well-off anywhere else in the world is bang average here. Unlike many other places, tenure, intelligence and hard work can be loosely correlated with outcomes in the Bay. Living through a societally transformative gold rush in that environment can be paralyzing. "Am I in the right place? Should I move? Is there time still left? Am I gonna make it?" It psychologically torments many who have moved here in search of "success". Ironically, a frequent side effect of this torment is to spin up the very products making everyone rich in hopes that you too can vibecode your path to economic enlightenment.
35
66
1,227
207,577
Highly recommended post by @yoonholeee discussing rich new research directions! Harness engineering will come to an end. An era of harness learning is in front of us, with massive room for empirical and theoretical research on data, architectures, and algorithms.
1
1
27
6,210
Ahmad Beirami retweeted

11
55
396
111,484
Ahmad Beirami retweeted
A better teacher should improve the student. Right? 🤔 That’s the core bet behind on-policy self-distillation: Condition a model on rich feedback to get a strong teacher, then distill it back into the student. In a new paper, we prove this can fail: • Even with a strictly better teacher and exact natural-gradient updates, the broad class of 𝑓-divergence self-distillation objectives, including reverse KL and Jensen-Shannon, 𝗰𝗮𝗻 𝗺𝗮𝗸𝗲 𝘁𝗵𝗲 𝘀𝘁𝘂𝗱𝗲𝗻𝘁 𝘄𝗼𝗿𝘀𝗲. • Existing methods often only do local credit assignment: they look at where teacher and student disagree 𝘢𝘵 𝘦𝘢𝘤𝘩 𝘵𝘰𝘬𝘦𝘯. But early decisions shape future states. We prove that missing that future effect can lead to strictly suboptimal policies. We introduce DistIL: A distributional variant of DAgger for RL from rich feedback. DistIL optimizes a forward cross-entropy objective whose gradients 𝗰𝗼𝗺𝗯𝗶𝗻𝗲 𝗹𝗼𝗰𝗮𝗹 𝗮𝗻𝗱 𝗳𝘂𝘁𝘂𝗿𝗲-𝗮𝘄𝗮𝗿𝗲 𝗰𝗿𝗲𝗱𝗶𝘁 𝗮𝘀𝘀𝗶𝗴𝗻𝗺𝗲𝗻𝘁, bringing later teacher–student disagreement back to earlier token decisions. Theoretically, DistIL: • Enjoys monotonic policy improvement; • Offers guarantees on regret; • Optimizes a teacher-weighted lower bound on the success probability, thus 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗶𝗻𝗴 𝗣𝗮𝘀𝘀@𝗡 𝗳𝗼𝗿 𝗲𝘃𝗲𝗿𝘆 𝗡. Empirically, DistIL outperforms RLVR and self-distillation baselines across domains: scientific reasoning, coding, and hard mathematical reasoning. 📄 Paper: arxiv.org/pdf/2606.05152 💻 Code: github.com/rishabh-1086/dist… Grateful to @rishabh_694 and @jacobfeinashley for their contributions to this work, and to @rishabh_694 in particular for doing a wonderful job leading the project.
6
32
294
20,366
Ahmad Beirami retweeted
This photo captures the exact moment the room erupted in laughter at ACM CAIS '26. Right mid-panel on "AI Agents for Discovery in the Wild", a phone unexpectedly blurted out: "That’s my line!" It was the perfect live demonstration of the "jagged edge" of AI—the gap between how frontier model capabilities and how deployed agents actually behave today.
1
1
8
2,157
Had a great time at CAIS '26 discussing "AI Agents for Discovery in the Wild" alongside Mohammad Alizadeh and @AlexGDimakis, with fantastic moderation by @mertcemri. We dug into the reality of deploying agents today and where the field is heading. A few of my core takeaways from the conversation: - Harnesses and Scaffolding are here to stay: While frontier models will naturally absorb a lot of the common-sense integrity checks (hallucinations, tool call errors, math mistakes), scaffolds will remain critical for encoding the specific, proprietary policies and specs of the complex systems enterprises are trying to build. - Harness engineering is not the answer: Today, engineers spend a lot of time tweaking harnesses. But if we have learned one thing from the Bitter Lesson, it is that we should let AI decide most of the "how." In the future, harnesses should be learned, not hand-coded. (P.S. Alex’s Siri actually jumped in and said “That’s my line” right as this point was made, so it must certainly be true.) - The researcher’s role is moving upstream: The "how" of research is commoditizing with AI. The distinct value of human researchers will be concentrated in defining exactly what problems are worth solving. - Verification is (and has been) the true bottleneck: AI agents have the capacity to generate novel outcomes, but those breakthroughs might be one in many millions. Nobody will pay attention unless they can be surfaced through rigorous verification. - Evaluation economics are shifting: Agentic evaluation is reaching cost and quality parity with human evaluation in many domains. As token costs drop, we’ll unlock massive exploration potential. If you’re excited about pushing these boundaries, especially as we tackle these challenges, please do reach out!
It was so much fun moderating the afternoon panel today with a great set of distinguished researchers today, thanks for the engaging and enjoyable discussion @AlexGDimakis @abeirami and Mohammad Alizadeh!
3
3
32
6,796
Ahmad Beirami retweeted
It was so much fun moderating the afternoon panel today with a great set of distinguished researchers today, thanks for the engaging and enjoyable discussion @AlexGDimakis @abeirami and Mohammad Alizadeh!
The fun continues 🔥 Now, we have our second panel with: @abeirami @AlexGDimakis and Mohammad Alizadeh Come by to hear more about hot takes from the field’s thought leaders!
2
16
8,704
Ahmad Beirami retweeted
I'm giving a contributed talk on Meta-Harness at the @CAISconf Workshop on AI Agents for Discovery in the Wild! ai-discovery-in-the-wild.git… My talk starts at 10:05 if you're here

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end
19
100
16,471
When asked about the why or how behind your work, "Claude/GPT/Gemini/etc did it that way" is not an acceptable answer. If all you do is relay the model's output, you will eventually be bypassed. When someone reads your session and finds nothing substantial but nodding along to the model's suggestions, you're playing yes-generator, which is easily automatable. It is perfectly fine to let the models generate, but it is your job to defend the output. Critical thinking has never mattered more. Don't offload it to the models!
1
2
25
1,675
Ahmad Beirami retweeted
The crew is ready for the workshop 🏁🫡
Very excited about our workshop on AI Agents for Discovery in the Wild 🐾🦒🐅, happening *tomorrow*, Tuesday, May 26th 9am-5pm, as part of CAIS '26 in San Jose. We were blown away by all of the excellent submissions we got…(1/n)
3
4
40
12,552
Ahmad Beirami retweeted
Very excited about our workshop on AI Agents for Discovery in the Wild 🐾🦒🐅, happening *tomorrow*, Tuesday, May 26th 9am-5pm, as part of CAIS '26 in San Jose. We were blown away by all of the excellent submissions we got…(1/n)
3
11
39
17,921
Ahmad Beirami retweeted
The best career advice used to be simple: Be the best at what you do. Ignore the trends. Become the best designer, the best engineer, the best researcher. Still necessary. No longer sufficient. AI now generates good on demand. It cannot yet reach best. Good is no longer scarce. Best still is. Work used to move through handoffs between narrow stages. The slicing has changed: bigger slices, owned end-to-end by one person. You're not just doing more. You're landing every stage yourself. Specialization needs handoffs. End-to-end ownership doesn't. Vision and execution used to be split. AI flipped the leverage. It turns vision into execution, but it cannot generate the vision itself. Execution is commoditizing. Vision is not. Three gaps have never been wider: best vs. good, end-to-end ownership vs. specialization, and vision vs. execution.
5
6
69
7,679
I am deeply concerned by the recent USCIS guidance on green card applications. Forcing legal temporary residents to uproot their lives, leave their jobs, and return to their home countries to apply for permanent residency does not protect us. It costs us. These professionals have already been vetted. They build companies, staff our labs, pay taxes, and enrich our communities. Instead of pushing them into years of consular backlogs and discretionary review abroad, we should offer them a predictable path to permanent legal immigration. Innovation does not wait. Our competitors are recruiting from this exact pool. If the U.S. becomes the country with the longest waits and the most uncertainty, the world's best talent will simply go elsewhere. We need this talent here.
145
11
158
22,180
Ahmad Beirami retweeted
Today, we are unveiling @hyperion__cap, an investment firm built to be the best strategy partner for deeptech founders. Again and again, we heard from these founders that venture capital has been failing them, even as more deeptech funds entered the market in recent years. Too many deeptech VCs lack a real command of industry history, hardware unit economics, go-to-market, and engineering nuance. Instead of rigorously evaluating complex frontier technologies, they often pass with vague references to “science risk." They overlook exceptional founders outside usual elite networks and concentrate capital based on pedigree, reducing their thinking to hand-wavy “founder bets." These VCs then prioritize promotion and social media over delivering real value to founders. In this world, LPs are also losing. And more of them continue to be disappointed by the lack of rigor that their deeptech GPs bring to evaluating these startups. We raised $35 million for Hyperion's Fund I to change this paradigm. On average, we complete 100 pages of deep research and strategy ideas that are shared with our founders. We also share this industry research with our LP base, which has already helped support our founders with additional capital and valuable introductions. We hold regular strategy sessions with our founders and obtain key connections that unlock new growth vectors for their businesses. Over the last 6 months, we’ve already invested $9 million behind founders across 7 companies: @FarisSbahi at Normal Computing, @isaiah_p_taylor @ Valar Atomics, Will Wilson @AntithesisHQ , @drauwsy @ Kunin, @abeirami & @aparandehgheibi @ [stealth], Charlie Cheng @ TC Lab, and Mike & Josh @ F-ADA. I'm deeply grateful to the senior leadership at Vista Equity Partners for their support, to the venture GPs who have advised us, and to the founders who chose to partner with us in the earliest days. We’re especially grateful to our limited partners, who put their trust in us at the firm's inception. Lastly, to build this firm alongside one of my closest friends and college teammates @henr56520 been a true privilege. We’re looking forward to working relentlessly for the founders we've backed and for those we’ll have the chance to support in the years ahead. hyperioncap.co
78
39
349
97,621
Dillon and Henry have been extremely helpful partners in the past 6 months! They actually care about the mechanics of connecting hard tech to real business value. Glad we made the decision to partner with Hyperion early on. Many congrats on the official launch!
Today, we are unveiling @hyperion__cap, an investment firm built to be the best strategy partner for deeptech founders. Again and again, we heard from these founders that venture capital has been failing them, even as more deeptech funds entered the market in recent years. Too many deeptech VCs lack a real command of industry history, hardware unit economics, go-to-market, and engineering nuance. Instead of rigorously evaluating complex frontier technologies, they often pass with vague references to “science risk." They overlook exceptional founders outside usual elite networks and concentrate capital based on pedigree, reducing their thinking to hand-wavy “founder bets." These VCs then prioritize promotion and social media over delivering real value to founders. In this world, LPs are also losing. And more of them continue to be disappointed by the lack of rigor that their deeptech GPs bring to evaluating these startups. We raised $35 million for Hyperion's Fund I to change this paradigm. On average, we complete 100 pages of deep research and strategy ideas that are shared with our founders. We also share this industry research with our LP base, which has already helped support our founders with additional capital and valuable introductions. We hold regular strategy sessions with our founders and obtain key connections that unlock new growth vectors for their businesses. Over the last 6 months, we’ve already invested $9 million behind founders across 7 companies: @FarisSbahi at Normal Computing, @isaiah_p_taylor @ Valar Atomics, Will Wilson @AntithesisHQ , @drauwsy @ Kunin, @abeirami & @aparandehgheibi @ [stealth], Charlie Cheng @ TC Lab, and Mike & Josh @ F-ADA. I'm deeply grateful to the senior leadership at Vista Equity Partners for their support, to the venture GPs who have advised us, and to the founders who chose to partner with us in the earliest days. We’re especially grateful to our limited partners, who put their trust in us at the firm's inception. Lastly, to build this firm alongside one of my closest friends and college teammates @henr56520 been a true privilege. We’re looking forward to working relentlessly for the founders we've backed and for those we’ll have the chance to support in the years ahead. hyperioncap.co
1
9
1,503
Ahmad Beirami retweeted
Mr. President, respectfully please consider the impact on everyday Iranian people. Targeting infrastructure like power or water makes life much harder for civilians. Millions of families are simply trying to live their daily lives with dignity! 🙏 @POTUS
27
98
403
26,294