Building tools for AI devs @nvidia · Berkeley CS · Sharing what I'm learning

Joined January 2015
17 Photos and videos
100% agree on the Context Hub. Developers constantly tweak their approaches to manage context in their prompts with each new model and tool suite release. I would even say that the core problem we face is a "Config Hub" problem - different configurations of models, harnesses, and data augmentation techniques are making managing agents in prod much more brittle and difficult.
2
11
5,659
Rishi Khare retweeted
Excited to share that my ICLR 2026 Oral Talk for GEPA is available on YouTube. I go deeper into why GEPA works better than prior optimization techniques, along with touching on many aspects of GEPA! youtu.be/HbGah-uP1fI
Thrilled to present GEPA as an Oral Talk and Poster at ICLR 2026 this Friday in Rio! 🇧🇷 Apr 24 Oral Session 3A (Agents), 10:30 AM BRT, Amphitheater Poster Session 4, 3:15 PM, Pavilion 3 x.com/LakshyAAAgrawal/status… Let's recap what's happened since we released GEPA last year 🧵
9
47
245
30,223
"we have reached agi..."
1
3
96
Rishi Khare retweeted
Very excited to share that GEPA is now live on @DSPyOSS as dspy.GEPA! This is an early code release. We’re looking forward to community feedback, especially about any practical challenges in switching optimizers.
9
39
323
45,582
Can't wait to try this on my mac... @lmstudio gpt-oss-20b and 120b from @OpenAI just released <1hr ago
4
384
Rishi Khare retweeted
S&P 10 vs S&P 490 🫣
492
1,013
10,150
1,679,108
Rishi Khare retweeted
4 Aug 2025
🤯 must watch
GEPA is a SUPER exciting advancement for @DSPyOSS and a new generation of optimization algorithms re-imagined with LLMs! 🧩🚀 Starting with the title of the paper, the authors find that Reflective Prompt Evolution can outperform Reinforcement Learning!! 🤯 Using LLMs to write and refine prompts (for another LLM to complete a task) is outperforming (!!) highly targeted gradient descent updates using cutting-edge RL algorithms! ⚖️ GEPA makes three key innovations on how exactly we use LLMs to propose prompts for LLMs -- (1) Pareto Optimal Candidate Selection, (2) Reflective Prompt Mutation, and (3) System-Aware Merging for optimizing Compound AI Systems. 🧠🧠 The authors further present how GEPA can be used for training at test-time, one of the most exciting directions AI is evolving in! 🚀 Here is my review of the paper! I hope you find it useful! 🎙️
5
30
191
10,066
Rishi Khare retweeted
Guy has his fingers positioned wrong on his keyboard and types gibberish into ChatGPT. It decodes what he meant to say by assuming his hand was shifted right and re-mapping his fingers 🤯 (from u/mimic751)
136
237
4,698
328,001
Strongly recommend ML practitioners try @lmstudio. Tried it for the first time today and found it remarkable how far private local models have come since the last time I used @ollama with Llama 3.
1
16
3,225
Rishi Khare retweeted
30 Jul 2025
Iterative reflections for LLMs can outperform heavy RL? This paper shows that having the LLM reflects on its own trajectories, rewrite its own prompts, and evolve a diverse pool of candidates beats RL w/ GRPO so far on four reasoning tasks . 10% improv with 35x fewer rollouts!
7
34
223
15,320
Rishi Khare retweeted
2 Aug 2025
Elon Musk has uncovered Vine video archives that were thought to be lost media He's restoring access so users can watch their favorite Vines again
859
3,408
103,624
6,158,557
Rishi Khare retweeted
Infrastructure for Multi-Agent Systems @koomen Multi-agent systems are powerful but hard to build. Think agentic MapReduce with thousands of subagents running in parallel. We're looking for folks who've felt the pain of scaling these systems and want to make operating fleets of agents as easy as deploying a web service.
8
3
49
11,498
Rishi Khare retweeted
31 Jul 2025
Our latest optimizer GEPA writes beautiful prompts, even with a “mini” model. Stay tuned for a lot more these coming days.
Replying to @LakshyAAAgrawal
We implemented GEPA as a new @DSPyOSS optimizer (release soon!). This means that it works for even sophisticated agents or compound systems you've already implemented. GEPA outperforms the MIPROv2 optimizer by as much as 11% across 4 tasks for Qwen3 and GPT-4.1-mini. Of course: Weight updates remain necessary to teach the models completely new tasks and still excel at general-purpose (massively multi-task!) post-training! However, we show that for specialization to downstream systems, reflective prompt optimization can go really far with tiny data sizes and rollout budgets! (2/n)
10
42
391
30,521
Rishi Khare retweeted
31 Jul 2025
this seems really important: it is totally plausible that a model could get IMO gold without *any* reinforcement learning, given a perfectly-crafted prompt we just don't know, and lack tools to efficiently search through prompt space. glad to see at least someone is trying
How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵
27
35
440
42,086
Rishi Khare retweeted
Methods that may not have even existed when you wrote your DSPy program... Policy gradient RL (GRPO) vs Bayesian search over grounded instruction/fewshot proposals (MIPRO) vs Reflective prompt learning (GEPA) All optimizing identical DSPy programs! And better optimizers to come
How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵
4
20
179
16,251
Rishi Khare retweeted
New paper: Reflective Prompt Evolution Can Outperform GRPO. It's becoming clear that learning via natural-language reflection (aka prompt optimization) will long be a central learning paradigm for building AI systems. Great work by @LakshyAAAgrawal and team on GEPA and SIMBA.
How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵
11
91
487
70,566
Rishi Khare retweeted
Brilliant paper. GEPA (Genetic-Pareto), a prompt optimizer that thoroughly incorporates natural language reflection to learn high-level rules from trial and error. GEPA shows that evolving prompts with natural‑language feedback outperforms reinforcement learning by up to 19% and needs 35× fewer rollouts. It teaches a multi‑step AI system by rewriting its own prompts instead of tweaking model weights. The standard approach, reinforcement learning (RL), particularly Group Relative Policy Optimization (GRPO), is effective but computationally expensive. A full rollout is expensive, which is 1 complete try where the language model tackles a task, gets judged, and sends that score back to the training loop. Algorithms like GRPO need a huge batch of these episodes, often 10 000 – 100 000, because policy‑gradient math works on averages. A reliable gradient estimate appears only after you have sampled lots of different action paths across the problem space. Fewer samples would leave the update too noisy, so the optimiser keeps asking for more runs. 🧬 What GEPA actually is GEPA flips the script by reading its own traces, writing natural‑language notes, and fixing prompts in place, so the learning signal is richer than a single score After each run it asks the model what went wrong, writes plain‑text notes, mutates the prompt, and keeps only candidates on a Pareto frontier so exploration stays broad. The Pareto‑based rule chooses a candidate that looks good but has room to improve. This avoids wasting time on hopeless or already‑perfect options. Quick tests on tiny batches spare rollouts, and winning changes migrate into the live prompt. Across 4 tasks it beats GRPO by up to 19% while needing up to 35X fewer runs, also overtaking MIPROv2. 🧵 Read on 👇
4
13
68
52,772