Joined May 2010
28 Photos and videos
Michael Bendersky retweeted
Most agentic search systems get better by thinking longer: more tool calls, more reason-act loops, each step waiting on the last. Quality goes up, but so does latency. Instructed-Retriever-1 takes a different route. Instead of scaling test-time compute sequentially, it scales it in parallel. One retrieval-specialized model fans the work out: it generates multiple query and filter formulations to widen recall, then reranks the merged evidence with a multi-pivot reranker to sharpen precision. Both stages run at once, so searching more broadly no longer means searching more slowly. The result inside Knowledge Assistant: search time drops more than 3x and answer time 2x, with time to first token around two seconds, and no drop in quality (it matches Claude Sonnet 4.5 retrieval quality on KARLBench). For the people using it, that means far less waiting between question and answer, the freedom to ask more follow-ups, and more of the knowledge base actually surfaced. Rolling out to all customers now, with no reconfiguration. Read how we did it: databricks.com/blog/3x-faste…
6
10
75
28,121
Really exciting new collaboration with @NegarEmpr @mrdrozdov and @matei_zaharia on query utility gap between ranking and generation (to appear #SIGIR2026). Check it out!
1/ "Can QPP Choose the Right Query Variant?" has been accepted at #SIGIR2026!🇦🇺 You can easily over-generate multiple query variants at low cost, but running RAG for all of them is expensive! Can we pick the winner query before paying the generation cost? arxiv.org/abs/2604.22661
2
6
929
Michael Bendersky retweeted
Most enterprise questions don't live in one dataset. They span structured systems and unstructured sources like documents, reviews, and reports. In our latest research, we show how Agent Bricks Supervisor Agent handles this by decomposing queries across structured and unstructured tools, then synthesizing results over multiple reasoning steps. The results across STaRK and KARLBench: 20% improvement over SoTA baselines, with the biggest gains on tasks requiring tight integration of structured and unstructured data. All built declaratively — no custom code, just precise instructions and the right tools. databricks.com/blog/agentic-…
5
15
49
10,412
Michael Bendersky retweeted
Applications are officially open for the Grounded Reasoning Cup at Data AI Summit 2026! 🏆 We’re looking for students who want to: - Tackle high‑impact enterprise challenges - Showcase work to top researchers/engineers (with recruiters in the room) - Compete for $100k in model credit prizes Apply here: forms.gle/ijdHDbAJFZpFLi1b7 Competition overview: bit.ly/groundreasoningcup-st…
2
14
64
30,843
We just published OfficeQA Pro - a set of 133 challenging questions from the original OfficeQA benchmark. Even the best frontier agents still struggle on OfficeQA Pro with common issues stemming from errors in parsing, retrieval, and visual reasoning.
1
8
25
2,351
All of these are realistic problems that @databricks customers face in their daily work, and we hope that OfficeQA Pro will contribute to advancing SoTA on grounded reasoning tasks. Technical Report: arxiv.org/pdf/2603.08655 Github: github.com/databricks/office…

1
1
6
256
Congratulations to @kristahopsalong @arnav_thebigman @jazco @ivanzhouyq Erich Elsen @matei_zaharia and everyone at @DbrxMosaicAI who made this work possible! Special thanks you to our partners @USAFacts @superannotate @turingcom and to all Github contributors!
7
225
I thought about posting a thread on KARL, a new Pareto-optimal model for retrieval and grounded reasoning tasks. But @jefrankle did a much better job than I ever could. If you have any interest in information retrieval and/or RL, check it out! Full report: databricks.com/sites/default…

Meet KARL, an RL'd model for document-centric tasks at frontier quality and open source cost/speed. Great for @databricks customers and scientists (77-page tech report!) As usual, this isn't just one model - it's an RL assembly line to churn out models for us and our customers 🧵
1
3
26
2,380
This was an incredibly fun collaboration with @j_nadan_chang @mrdrozdov @ShubhamToshniw6 @owenoertell @alexrtrott @WenSun1 @jefrankle and many others here at Databricks AI Research.
5
210
Michael Bendersky retweeted
Agent memory is a simple and powerful way to do continual learning! With the new MemAlign method from Databricks Research, we can build better LLM judges from examples of human ratings, and they scale with more data. Now in Databricks and @MLflow. databricks.com/blog/memalign…
10
38
235
18,723
Instructed retriever is not just better than RAG, but it is also a much more effective tool in a multi-step agentic setting, where it not only delivers better results, but also does it faster and in fewer steps.
1
1
189
If you are excited about the intersection of reinforcement learning and highly complex economically valuable tasks --I can't think of a better place to spend the summer of 2026!
I'm hiring interns for next summer at @databricks! Specifically on (1) empirical RL at scale on non-verifiable tasks and (2) enabling real people specify the behaviors they want out of AI (e.g., through evals) on highly complex tasks. 🧵
6
242
Huge congratulations to @kristahopsalong and @arnav_thebigman who spearheaded this work, and all our co-authors @jazco , @ivanzhouyq , @cindyxinyiwang , @abaheti95 , @JacobianNeuro , @sam_havens , Erich Elsen, @matei_zaharia and Xing Chen!
1
7
184
Big thanks to the entire @databricks AI Research team, and our partners SuperAnnotate, Turing and USAFacts!
5
151