The paper builds a 3-agent search system that beats flat agents on real enterprise tasks.
Enterprises need answers that combine private documents with the Web.
Training 1 big agent that drives every tool at once sounds simple, but it struggles because the action space explodes, it overuses easy tools, it underuses harder Web search, and training wastes data.
HierSearch splits the job.
A local agent searches text chunks and a knowledge graph.
A Web agent queries search engines and reads pages.
A planner decides which agent to call, merges the evidence, then writes the final answer.
They train the 2 lower agents first, then train the planner on top, this is hierarchical reinforcement learning.
They also add a small knowledge refiner that keeps only evidence that actually moves the next thinking step forward, then adds a few items that support the final answer across sources, this blocks copied hallucinations.
EM means exact match, F1 balances token precision and recall.
Across 6 benchmarks it wins clearly, with F1 at 62.83 on MuSiQue, 46.37 on OmniEval finance, 66.99 on BioASQ medical, 68.00 on NQ, 67.40 on HotpotQA, and 72.81 on PubMedQA.
Ablations show each piece matters.
Bottom line, a simple 3-agent stack plus a light refiner gives better answers, less noise, and lower Web spend.
----
Paper – arxiv. org/abs/2508.08088
Paper Title: "HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches"