Search agents, whether they're powering deep research, or multi-step QA over a private corpus, spend most of their time and compute in the research loop: query, search, reason, repeat.
We wanted to make that loop faster and more accurate. So we optimized two things jointly: the retrieval stack itself, and the planner that decides when and how to search.
A trained planner on our fastest retrieval config matches an untrained planner on the most expensive one, at half the latency. Every arrow in this plot points up and to the left. [1/n]