Michael Bendersky

Michael Bendersky

28 Photos and videos

Tweets

Pinned Tweet

Michael Bendersky @bemikelive

Jan 6

Really excited to share our latest research on the Instructed Retriever - a novel retrieval architecture that reimagines search for the agentic era. databricks.com/blog/instruct… Amazing work by @cindyxinyiwang and @mrdrozdov who co-led this effort!

Instructed Retriever: Unlocking System-Level Reasoning in Search Agents | Databricks Blog

Discover Instructed Retriever, a new retrieval architecture that outperforms RAG by enabling system-level reasoning, instruction-following, and higher-quality enterprise search agents.

databricks.com

2,982

Databricks AI Research

Michael Bendersky retweeted

Databricks AI Research

@DbrxMosaicAI

Jun 4

Most agentic search systems get better by thinking longer: more tool calls, more reason-act loops, each step waiting on the last. Quality goes up, but so does latency. Instructed-Retriever-1 takes a different route. Instead of scaling test-time compute sequentially, it scales it in parallel. One retrieval-specialized model fans the work out: it generates multiple query and filter formulations to widen recall, then reranks the merged evidence with a multi-pivot reranker to sharpen precision. Both stages run at once, so searching more broadly no longer means searching more slowly. The result inside Knowledge Assistant: search time drops more than 3x and answer time 2x, with time to first token around two seconds, and no drop in quality (it matches Claude Sonnet 4.5 retrieval quality on KARLBench). For the people using it, that means far less waiting between question and answer, the freedom to ask more follow-ups, and more of the knowledge base actually surfaced. Rolling out to all customers now, with no reconfiguration. Read how we did it: databricks.com/blog/3x-faste…

28,121

Michael Bendersky

Michael Bendersky @bemikelive

Apr 28

Really exciting new collaboration with @NegarEmpr @mrdrozdov and @matei_zaharia on query utility gap between ranking and generation (to appear #SIGIR2026). Check it out!

Negar Arabzadeh

@NegarEmpr

Apr 28

1/ "Can QPP Choose the Right Query Variant?" has been accepted at #SIGIR2026!🇦🇺 You can easily over-generate multiple query variants at low cost, but running RAG for all of them is expensive! Can we pick the winner query before paying the generation cost? arxiv.org/abs/2604.22661

929

Databricks AI Research

Michael Bendersky retweeted

Databricks AI Research

@DbrxMosaicAI

Apr 14

Most enterprise questions don't live in one dataset. They span structured systems and unstructured sources like documents, reviews, and reports. In our latest research, we show how Agent Bricks Supervisor Agent handles this by decomposing queries across structured and unstructured tools, then synthesizing results over multiple reasoning steps. The results across STaRK and KARLBench: 20% improvement over SoTA baselines, with the biggest gains on tasks requiring tight integration of structured and unstructured data. All built declaratively — no custom code, just precise instructions and the right tools. databricks.com/blog/agentic-…

10,412

Databricks AI Research

Michael Bendersky retweeted

Databricks AI Research

@DbrxMosaicAI

Mar 31

Applications are officially open for the Grounded Reasoning Cup at Data AI Summit 2026! 🏆 We’re looking for students who want to: - Tackle high‑impact enterprise challenges - Showcase work to top researchers/engineers (with recruiters in the room) - Compete for $100k in model credit prizes Apply here: forms.gle/ijdHDbAJFZpFLi1b7 Competition overview: bit.ly/groundreasoningcup-st…

30,843

Michael Bendersky

Michael Bendersky @bemikelive

Mar 10

We just published OfficeQA Pro - a set of 133 challenging questions from the original OfficeQA benchmark. Even the best frontier agents still struggle on OfficeQA Pro with common issues stemming from errors in parsing, retrieval, and visual reasoning.

2,351

Michael Bendersky

Michael Bendersky @bemikelive

Mar 10

All of these are realistic problems that @databricks customers face in their daily work, and we hope that OfficeQA Pro will contribute to advancing SoTA on grounded reasoning tasks. Technical Report: arxiv.org/pdf/2603.08655 Github: github.com/databricks/office…

256

Michael Bendersky

Michael Bendersky @bemikelive

Mar 10

Congratulations to @kristahopsalong @arnav_thebigman @jazco @ivanzhouyq Erich Elsen @matei_zaharia and everyone at @DbrxMosaicAI who made this work possible! Special thanks you to our partners @USAFacts @superannotate @turingcom and to all Github contributors!

225

Michael Bendersky

Michael Bendersky @bemikelive

Mar 5

I thought about posting a thread on KARL, a new Pareto-optimal model for retrieval and grounded reasoning tasks. But @jefrankle did a much better job than I ever could. If you have any interest in information retrieval and/or RL, check it out! Full report: databricks.com/sites/default…

Jonathan Frankle

@jefrankle

Mar 5

Meet KARL, an RL'd model for document-centric tasks at frontier quality and open source cost/speed. Great for @databricks customers and scientists (77-page tech report!) As usual, this isn't just one model - it's an RL assembly line to churn out models for us and our customers 🧵

2,380

Michael Bendersky

Michael Bendersky @bemikelive

Mar 5

This was an incredibly fun collaboration with @j_nadan_chang @mrdrozdov @ShubhamToshniw6 @owenoertell @alexrtrott @WenSun1 @jefrankle and many others here at Databricks AI Research.

210

Matei Zaharia

Michael Bendersky retweeted

Matei Zaharia @matei_zaharia

Feb 4

Agent memory is a simple and powerful way to do continual learning! With the new MemAlign method from Databricks Research, we can build better LLM judges from examples of human ratings, and they scale with more data. Now in Databricks and @MLflow. databricks.com/blog/memalign…

MemAlign: Building Better LLM Judges From Human Feedback With Scalable Memory | Databricks Blog

Meta: MemAlign aligns LLM judges with human feedback using scalable memory, delivering state-of-the-art quality with 10–100× lower cost and latency.

databricks.com

235

18,723

Michael Bendersky

Michael Bendersky @bemikelive

Jan 6

Instructed Retriever: Unlocking System-Level Reasoning in Search Agents | Databricks Blog

Discover Instructed Retriever, a new retrieval architecture that outperforms RAG by enabling system-level reasoning, instruction-following, and higher-quality enterprise search agents.

databricks.com

2,982

Michael Bendersky

Michael Bendersky @bemikelive

Jan 6

Instructed retriever is not just better than RAG, but it is also a much more effective tool in a multi-step agentic setting, where it not only delivers better results, but also does it faster and in fewer steps.

189

Michael Bendersky

Michael Bendersky @bemikelive

Jan 6

Instructed retriever is now available for all of our Agent Bricks Knowledge Assistant customers. Consider trying it out for your next retrieval agent project. docs.databricks.com/aws/en/g…

Use Knowledge Assistant to create a high-quality chatbot over your documents | Databricks on AWS

Learn how to create a custom AI chatbot on your documents using Knowledge Assistant.

docs.databricks.com

153

Michael Bendersky

Michael Bendersky @bemikelive

19 Dec 2025

If you are excited about the intersection of reinforcement learning and highly complex economically valuable tasks --I can't think of a better place to spend the summer of 2026!

Jonathan Frankle

@jefrankle

19 Dec 2025

I'm hiring interns for next summer at @databricks! Specifically on (1) empirical RL at scale on non-verifiable tasks and (2) enabling real people specify the behaviors they want out of AI (e.g., through evals) on highly complex tasks. 🧵

242

Michael Bendersky

Michael Bendersky @bemikelive

9 Dec 2025

We released OfficeQA today -- a hard benchmark for evaluating agents on grounded reasoning tasks. More details in our blog databricks.com/blog/introduc… and the thread below

Introducing OfficeQA: A benchmark for end-to-end grounded reasoning | Databricks Blog

OfficeQA is Databricks’ new open benchmark for grounded reasoning on real-world enterprise data, built on U.S. Treasury Bulletins to test modern AI agents.

databricks.com

2,447

more replies

Michael Bendersky

Michael Bendersky @bemikelive

9 Dec 2025

Huge congratulations to @kristahopsalong and @arnav_thebigman who spearheaded this work, and all our co-authors @jazco , @ivanzhouyq , @cindyxinyiwang , @abaheti95 , @JacobianNeuro , @sam_havens , Erich Elsen, @matei_zaharia and Xing Chen!

184

Michael Bendersky

Michael Bendersky @bemikelive

9 Dec 2025

Big thanks to the entire @databricks AI Research team, and our partners SuperAnnotate, Turing and USAFacts!

151