This was a good fit for RL because spreadsheet retrieval is repeated often, latency sensitive, and has clean feedback. The model either returns the right cent amount, date, invoice ID, yes/no, or row reference, or it does not. That let us optimize the retrieval policy directly with deterministic rewards.