in case you missed it, OBLIQ-Bench is now on arXiv:
arxiv.org/pdf/2605.06235
my hope is that this reduces the frequency of IR or search agents papers that I discard immediately as a reader because in 2026 they’re still evaluating on long-expired MS MARCO, NQ, HotPotQA, BEIR, etc
We set out to build a better retriever, so we looked for the hardest IR benchmarks.
For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left!
So we built OBLIQ-Bench to study much harder search queries than before.