Brilliant paper.
GEPA (Genetic-Pareto), a prompt optimizer that thoroughly incorporates natural language reflection to learn high-level rules from trial and error.
GEPA shows that evolving prompts with natural‑language feedback outperforms reinforcement learning by up to 19% and needs 35× fewer rollouts.
It teaches a multi‑step AI system by rewriting its own prompts instead of tweaking model weights.
The standard approach, reinforcement learning (RL), particularly Group Relative Policy Optimization (GRPO), is effective but computationally expensive.
A full rollout is expensive, which is 1 complete try where the language model tackles a task, gets judged, and sends that score back to the training loop. Algorithms like GRPO need a huge batch of these episodes, often 10 000 – 100 000, because policy‑gradient math works on averages. A reliable gradient estimate appears only after you have sampled lots of different action paths across the problem space. Fewer samples would leave the update too noisy, so the optimiser keeps asking for more runs.
🧬 What GEPA actually is
GEPA flips the script by reading its own traces, writing natural‑language notes, and fixing prompts in place, so the learning signal is richer than a single score
After each run it asks the model what went wrong, writes plain‑text notes, mutates the prompt, and keeps only candidates on a Pareto frontier so exploration stays broad.
The Pareto‑based rule chooses a candidate that looks good but has room to improve. This avoids wasting time on hopeless or already‑perfect options.
Quick tests on tiny batches spare rollouts, and winning changes migrate into the live prompt.
Across 4 tasks it beats GRPO by up to 19% while needing up to 35X fewer runs, also overtaking MIPROv2.
🧵 Read on 👇