Building retrieval for agents.

Joined March 2024
41 Photos and videos
Pinned Tweet
Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100 languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.
35
119
950
201,878
New: Metadata explorer Adding metadata to files enables filtering during search. Now, you can browse metadata fields and values across your store.
1
3
21
1,629
Agents can inspect file metadata in a store to understand available filters. docs: mixedbread.com/docs/stores/s…
5
218
By now, everyone knows that single-vector embedding models are hugely limiting for modern workflows. But they contain than you think: you can extract sparse Latent Terms from them. And it turns out that BM25 is all you need to turn this vocabulary into a strong retriever.
6
24
196
39,682
Having language-adjacent properties means that tools designed for lexical approaches "just work". BM25, always refusing to exit the scene, is strong here: applied over the Latent Terms extracted from nomic-embed-v1.5, it results in a near state-of-the-art sparse retriever.
1
23
2,585
New: grep for exact matching grep → keyword / regex matching search → fine-grained semantic retrieval Works across uploaded content, including text, PDFs (OCR) and audio/video (transcription). Give your agents both retrieval primitives to perform at their best.
2
5
65
5,460
Feature: Native agentic search on Mixedbread Search with auto-planning, exploration, and multi-hop reasoning across documents. Built for: - evidence discovery - exhaustive search - cross-document reasoning → Topped MADQA @snowflake with 93.4% accuracy across 18,000 PDF pages.
1
13
81
8,876
Steer search with more instructions. Docs: mixedbread.com/docs/stores/s…
1
11
815
View and export traces directly from your dashboard:
8
633
New: Traces for Mixedbread agentic search See every search call an agent makes directly in the dashboard, and tune instructions for better retrieval quality.
9
48
6,564
Introducing mxbai-rerank-v3-listwise: reranking that goes beyond binary relevance. It reads the whole candidate set, resolves conflicts, and ranks by directives like recency, source priority, and multi-step rules. 11% NDCG@10 on average across multiple domains, modalities, and languages in runs with Wholembed v3. Available today in preview in Mixedbread.
5
18
136
24,961
You can read more about this in our blog post, where we present more detailed benchmark results and elaborate on the nature of the three benchmarks, and why we're very proud to be topping all three of them. mixedbread.com/blog/closing-…
1
2
17
2,966
Mixedbread search's ultimate aim is to power all workflows, no matter their modality or language. Try it for your own knowledge-intensive tasks today: mixedbread.com/
1
11
2,213
Agents are increasingly performing knowledge work: Deep Research, generating financial reports, reasoning across historical knowledgebases... Many high-quality benchmarks now focus on evaluating such tasks, among which BrowseComp-Plus, @databricks's OfficeQA, or @Snowflake's MADQA, released just last week.
1
1
22
2,962
So what is the Oracle gap? Optimising agentic systems is complicated. There are many individual components you need to get just right. Retrieval is one of those components, and its impact is best measured by the Oracle gap: the difference between the performance of the same system between an imperfect retriever and perfect, fully-relevant results that would be provided by a so-called Oracle.
1
2
12
2,607
For Agentic tasks, Oracle-level performance is the maximum performance a system can achieve, assuming it is able to retrieve all relevant documents perfectly, every time. We're proud to show that Mixedbread Search approaches the Oracle on multiple knowledge intensive benchmarks.
4
22
148
80,263
Mixedbread retweeted
I've been eagerly awaiting this release from the @mixedbreadai folks. They're world-leading experts in late interaction retrieval. And today they remind us that late interaction done well makes all your favorite embedding models look like they don't work.
Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100 languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.
8
22
199
22,425