Ahmad Beirami

Ahmad Beirami

221 Photos and videos

Tweets

Pinned Tweet

Ahmad Beirami

@abeirami

Jan 13

We are hiring Members of Technical Staff (Research Engineers)! Current LLM agents lack reliability, creating a gap between demos and production. We solve this by automating the complex workflow of debugging, evaluation, and iteration required to make agents robust. 👇

648

82,347

Ahmad Beirami

Ahmad Beirami

@abeirami

10h

This is the kind of drama you’d expect to watch in a corporate thriller series

Sophia Cai

@SophiaCai99

13h

NEW: Inside the 24-hrs before WH slapped export controls on Anthropic - Last Thursday, Amazon CEO Andy Jassy raised concerns about Fable jailbreak to Trump admin - Friday AM, Sean Cairncross, Bessent, Susie etc. held WH call to discuss - Then White House started reaching out to Anthropic to speak with Dario Amodei, who was at a wellness retreat. - When Amodei was finally available past 1pm, he had three tense phone calls with a combo of ppl including Cairncross, Bessent, Lutnick, Kessler, Will Scharf, Richard Walters, and Walker Barrett. -Amodei tried to clear up what he assumed was a misunderstanding. He defended the guardrails and distinguished between universal and non-universal jailbreak - Cairncross and Bessent were unmoved and asked Amodei to take down Fable and work with the admin to fix the vulnerabilities. (A WH official said Amazon’s findings were run past the NSA and they felt they had “proof.”) - Amodei asked for more time and info, but he made no commitments to pull the model - Bessent told Amodei directly at one point that he was making a “bad decision” - By Friday evening, the Trump admin imposed its export controls. - “Export controls were a last resort after begging them for hours to work with us,” senior WH official said. W/ @cheyennehaslett politico.com/news/2026/06/13…

1,763

Ahmad Beirami

Ahmad Beirami

@abeirami

18h

💯

rohan anil

@_arohan_

May 16

We did research when pay was low. We did research when pay was uncertain. We did research even when we were lucky enough to be paid well. One way is to figure out what to work on is to work on things that matter and not think of rewards. We are still quite early into what makes a frontier model all the way from optimization, architecture and objectives. Big token wants to convince you otherwise.

2,322

rohan anil

Ahmad Beirami retweeted

rohan anil

@_arohan_

May 16

Deedy

@deedydas

May 16

The vibes in SF feel pretty frenetic right now. The divide in outcomes is the worst I've ever seen. Over the last 5yrs, a group of ~10k people - employees at Anthropic, OpenAI, xAI, Nvidia, Meta TBD, founders - have hit retirement wealth of well above $20M (back of the envelope AI estimation). Everyone outside that group feels like they can work their well-paying (but <$500k) job for their whole life and never get there. Worse yet, layoffs are in full swing. Many software engineers feel like their life's skill is no longer useful. The day to day role of most jobs has changed overnight with AI. As a result, 1. The corporate ladder looks like the wrong building to climb. Everyone's trying to align with a new set of career "paths": should I be a founder? Is it too late to join Anthropic / OpenAI? should I get into AI? what company stock will 10x next? People are demanding higher salaries and switching jobs more and more. 2. There’s a deep malaise about work (and its future). Why even work at all for “peanuts”? Will my job even exist in a few years? Many feel helpless. You hear the “permanent underclass” conversation a lot, esp from young people. It's hard to focus on doing good work when you think "man, if I joined Anthropic 2yrs ago, I could retire" 3. The mid to late middle managers feel paralyzed. Many have families and don't feel like they have the energy or network to just "start a company". They don't particularly have any AI skills. They see the writing on the wall: middle management is being hollowed out in many companies. 4. The rich aren’t particularly happy either. No one is shedding tears for them (and rightfully so). But those who have "made it" experience a profound lack of purpose too. Some have gone from <$150k to >$50M in a few years with no ramp. It flips your life plans upside down. For some, comparison is the thief of joy. For some, they escape to NYC to "live life". For others still, they start companies "just cuz", often to win status points. They never imagined that by age 30, they'd be set. I once asked a post-economic founder friend why they didn't just sell the co and they said "and do what? right now, everyone wants to talk to me. if i sell, I will only have money." I understand that many reading this scoff at the champagne problems of the valley. Society is warped in this tech bubble. What is often well-off anywhere else in the world is bang average here. Unlike many other places, tenure, intelligence and hard work can be loosely correlated with outcomes in the Bay. Living through a societally transformative gold rush in that environment can be paralyzing. "Am I in the right place? Should I move? Is there time still left? Am I gonna make it?" It psychologically torments many who have moved here in search of "success". Ironically, a frequent side effect of this torment is to spin up the very products making everyone rich in hopes that you too can vibecode your path to economic enlightenment.

1,227

207,577

Ahmad Beirami

Ahmad Beirami

@abeirami

Jun 9

Highly recommended post by @yoonholeee discussing rich new research directions! Harness engineering will come to an end. An era of harness learning is in front of us, with massive room for empirical and theoretical research on data, architectures, and algorithms.

Yoonho Lee

@yoonholeee

Jun 8

x.com/i/article/206401798198…

6,210

Yoonho Lee

Ahmad Beirami retweeted

Yoonho Lee

@yoonholeee

Jun 8

x.com/i/article/206401798198…

396

111,484

Paria Rashidinejad

Ahmad Beirami retweeted

Paria Rashidinejad

@paria_rd

Jun 8

A better teacher should improve the student. Right? 🤔 That’s the core bet behind on-policy self-distillation: Condition a model on rich feedback to get a strong teacher, then distill it back into the student. In a new paper, we prove this can fail: • Even with a strictly better teacher and exact natural-gradient updates, the broad class of 𝑓-divergence self-distillation objectives, including reverse KL and Jensen-Shannon, 𝗰𝗮𝗻 𝗺𝗮𝗸𝗲 𝘁𝗵𝗲 𝘀𝘁𝘂𝗱𝗲𝗻𝘁 𝘄𝗼𝗿𝘀𝗲. • Existing methods often only do local credit assignment: they look at where teacher and student disagree 𝘢𝘵 𝘦𝘢𝘤𝘩 𝘵𝘰𝘬𝘦𝘯. But early decisions shape future states. We prove that missing that future effect can lead to strictly suboptimal policies. We introduce DistIL: A distributional variant of DAgger for RL from rich feedback. DistIL optimizes a forward cross-entropy objective whose gradients 𝗰𝗼𝗺𝗯𝗶𝗻𝗲 𝗹𝗼𝗰𝗮𝗹 𝗮𝗻𝗱 𝗳𝘂𝘁𝘂𝗿𝗲-𝗮𝘄𝗮𝗿𝗲 𝗰𝗿𝗲𝗱𝗶𝘁 𝗮𝘀𝘀𝗶𝗴𝗻𝗺𝗲𝗻𝘁, bringing later teacher–student disagreement back to earlier token decisions. Theoretically, DistIL: • Enjoys monotonic policy improvement; • Offers guarantees on regret; • Optimizes a teacher-weighted lower bound on the success probability, thus 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗶𝗻𝗴 𝗣𝗮𝘀𝘀@𝗡 𝗳𝗼𝗿 𝗲𝘃𝗲𝗿𝘆 𝗡. Empirically, DistIL outperforms RLVR and self-distillation baselines across domains: scientific reasoning, coding, and hard mathematical reasoning. 📄 Paper: arxiv.org/pdf/2606.05152 💻 Code: github.com/rishabh-1086/dist… Grateful to @rishabh_694 and @jacobfeinashley for their contributions to this work, and to @rishabh_694 in particular for doing a wonderful job leading the project.

294

20,366

Ali Parandeh Gheibi

Ahmad Beirami retweeted

Ali Parandeh Gheibi

@aparandehgheibi

May 27

This photo captures the exact moment the room erupted in laughter at ACM CAIS '26. Right mid-panel on "AI Agents for Discovery in the Wild", a phone unexpectedly blurted out: "That’s my line!" It was the perfect live demonstration of the "jagged edge" of AI—the gap between how frontier model capabilities and how deployed agents actually behave today.

2,157

Ahmad Beirami

Ahmad Beirami

@abeirami

May 27

Had a great time at CAIS '26 discussing "AI Agents for Discovery in the Wild" alongside Mohammad Alizadeh and @AlexGDimakis, with fantastic moderation by @mertcemri. We dug into the reality of deploying agents today and where the field is heading. A few of my core takeaways from the conversation: - Harnesses and Scaffolding are here to stay: While frontier models will naturally absorb a lot of the common-sense integrity checks (hallucinations, tool call errors, math mistakes), scaffolds will remain critical for encoding the specific, proprietary policies and specs of the complex systems enterprises are trying to build. - Harness engineering is not the answer: Today, engineers spend a lot of time tweaking harnesses. But if we have learned one thing from the Bitter Lesson, it is that we should let AI decide most of the "how." In the future, harnesses should be learned, not hand-coded. (P.S. Alex’s Siri actually jumped in and said “That’s my line” right as this point was made, so it must certainly be true.) - The researcher’s role is moving upstream: The "how" of research is commoditizing with AI. The distinct value of human researchers will be concentrated in defining exactly what problems are worth solving. - Verification is (and has been) the true bottleneck: AI agents have the capacity to generate novel outcomes, but those breakthroughs might be one in many millions. Nobody will pay attention unless they can be surfaced through rigorous verification. - Evaluation economics are shifting: Agentic evaluation is reaching cost and quality parity with human evaluation in many domains. As token costs drop, we’ll unlock massive exploration potential. If you’re excited about pushing these boundaries, especially as we tackle these challenges, please do reach out!

Mert Cemri

@mertcemri

May 27

It was so much fun moderating the afternoon panel today with a great set of distinguished researchers today, thanks for the engaging and enjoyable discussion @AlexGDimakis @abeirami and Mohammad Alizadeh!

6,796

Mert Cemri

Ahmad Beirami retweeted

Mert Cemri

@mertcemri

May 27

Melissa Pan

@melissapan

May 26

The fun continues 🔥 Now, we have our second panel with: @abeirami @AlexGDimakis and Mohammad Alizadeh Come by to hear more about hot takes from the field’s thought leaders!

8,704

Yoonho Lee

Ahmad Beirami retweeted

Yoonho Lee

@yoonholeee

May 26

I'm giving a contributed talk on Meta-Harness at the @CAISconf Workshop on AI Agents for Discovery in the Wild! ai-discovery-in-the-wild.git… My talk starts at 10:05 if you're here

Yoonho Lee

@yoonholeee

Mar 30

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end

100

16,471

Ahmad Beirami

Ahmad Beirami

@abeirami

May 26

When asked about the why or how behind your work, "Claude/GPT/Gemini/etc did it that way" is not an acceptable answer. If all you do is relay the model's output, you will eventually be bypassed. When someone reads your session and finds nothing substantial but nodding along to the model's suggestions, you're playing yes-generator, which is easily automatable. It is perfectly fine to let the models generate, but it is your job to defend the output. Critical thinking has never mattered more. Don't offload it to the models!

1,675

Melissa Pan

Ahmad Beirami retweeted

Melissa Pan

@melissapan

May 26

The crew is ready for the workshop 🏁🫡

Alex Krentsel @AlexKrentsel

May 26

Very excited about our workshop on AI Agents for Discovery in the Wild 🐾🦒🐅, happening *tomorrow*, Tuesday, May 26th 9am-5pm, as part of CAIS '26 in San Jose. We were blown away by all of the excellent submissions we got…(1/n)

12,552

Alex Krentsel

Ahmad Beirami retweeted

Alex Krentsel @AlexKrentsel

May 26

17,921

Ahmad Beirami

Ahmad Beirami retweeted

Ahmad Beirami

@abeirami

May 23

The best career advice used to be simple: Be the best at what you do. Ignore the trends. Become the best designer, the best engineer, the best researcher. Still necessary. No longer sufficient. AI now generates good on demand. It cannot yet reach best. Good is no longer scarce. Best still is. Work used to move through handoffs between narrow stages. The slicing has changed: bigger slices, owned end-to-end by one person. You're not just doing more. You're landing every stage yourself. Specialization needs handoffs. End-to-end ownership doesn't. Vision and execution used to be split. AI flipped the leverage. It turns vision into execution, but it cannot generate the vision itself. Execution is commoditizing. Vision is not. Three gaps have never been wider: best vs. good, end-to-end ownership vs. specialization, and vision vs. execution.

7,679

Ahmad Beirami

Ahmad Beirami

@abeirami

May 23

I am deeply concerned by the recent USCIS guidance on green card applications. Forcing legal temporary residents to uproot their lives, leave their jobs, and return to their home countries to apply for permanent residency does not protect us. It costs us. These professionals have already been vetted. They build companies, staff our labs, pay taxes, and enrich our communities. Instead of pushing them into years of consular backlogs and discretionary review abroad, we should offer them a predictable path to permanent legal immigration. Innovation does not wait. Our competitors are recruiting from this exact pool. If the U.S. becomes the country with the longest waits and the most uncertainty, the world's best talent will simply go elsewhere. We need this talent here.

145

158

22,180

Dillon Dunteman

Ahmad Beirami retweeted

Dillon Dunteman

@dillon_dunteman

Apr 7

Today, we are unveiling @hyperion__cap, an investment firm built to be the best strategy partner for deeptech founders. Again and again, we heard from these founders that venture capital has been failing them, even as more deeptech funds entered the market in recent years. Too many deeptech VCs lack a real command of industry history, hardware unit economics, go-to-market, and engineering nuance. Instead of rigorously evaluating complex frontier technologies, they often pass with vague references to “science risk." They overlook exceptional founders outside usual elite networks and concentrate capital based on pedigree, reducing their thinking to hand-wavy “founder bets." These VCs then prioritize promotion and social media over delivering real value to founders. In this world, LPs are also losing. And more of them continue to be disappointed by the lack of rigor that their deeptech GPs bring to evaluating these startups. We raised $35 million for Hyperion's Fund I to change this paradigm. On average, we complete 100 pages of deep research and strategy ideas that are shared with our founders. We also share this industry research with our LP base, which has already helped support our founders with additional capital and valuable introductions. We hold regular strategy sessions with our founders and obtain key connections that unlock new growth vectors for their businesses. Over the last 6 months, we’ve already invested $9 million behind founders across 7 companies: @FarisSbahi at Normal Computing, @isaiah_p_taylor @ Valar Atomics, Will Wilson @AntithesisHQ , @drauwsy @ Kunin, @abeirami & @aparandehgheibi @ [stealth], Charlie Cheng @ TC Lab, and Mike & Josh @ F-ADA. I'm deeply grateful to the senior leadership at Vista Equity Partners for their support, to the venture GPs who have advised us, and to the founders who chose to partner with us in the earliest days. We’re especially grateful to our limited partners, who put their trust in us at the firm's inception. Lastly, to build this firm alongside one of my closest friends and college teammates @henr56520 been a true privilege. We’re looking forward to working relentlessly for the founders we've backed and for those we’ll have the chance to support in the years ahead. hyperioncap.co

0:12

349

97,621

Ahmad Beirami

Ahmad Beirami

@abeirami

Apr 8

Dillon and Henry have been extremely helpful partners in the past 6 months! They actually care about the mechanics of connecting hard tech to real business value. Glad we made the decision to partner with Hyperion early on. Many congrats on the official launch!

Dillon Dunteman

@dillon_dunteman

Apr 7

0:12

1,503

Pejman Nozad

Ahmad Beirami retweeted

Pejman Nozad

@pejmannozad

Mar 22

Mr. President, respectfully please consider the impact on everyday Iranian people. Targeting infrastructure like power or water makes life much harder for civilians. Millions of families are simply trying to live their daily lives with dignity! 🙏 @POTUS

403

26,294