Antoine Chaffin

Antoine Chaffin

49 Photos and videos

Tweets

jonah retweeted

Antoine Chaffin

@antoine_chaffin

Jun 11

Whether you are GPU poor or GPU rich, today's release of PyLate has something for you! GPU maxxers: MaxSim kernels greatly speed up training while lowering the memory requirements CPU enjoyers: TACHIOM enables lightning fast multi-vector indexing and search directly on CPU

123

10,859

jonah

jonah

@drexalt

May 29

- sparse maxsim - contrastively-trained sae - strong results on BERT and 8b nemo this is a great one

Sumit @_reachsumit

May 29

No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval @Veritas2026 et al. replace vector clustering with efficient sparse autoencoders & natural inverted indexing to accelerate multi-vector retrieval. 📝arxiv.org/abs/2605.30120 👨🏽‍💻github.com/Y-Research-SBU/SS…

656

Mixedbread

jonah retweeted

Mixedbread

@mixedbreadai

May 11

Introducing mxbai-rerank-v3-listwise: reranking that goes beyond binary relevance. It reads the whole candidate set, resolves conflicts, and ranks by directives like recency, source priority, and multi-step rules. 11% NDCG@10 on average across multiple domains, modalities, and languages in runs with Wholembed v3. Available today in preview in Mixedbread.

136

24,967

jonah

jonah

@drexalt

May 1

Yes, the same team behind the SoTA Sparse index SEISMIC is also behind what appears to be SoTA multi-vector index. what is in the water in Pisa

Sumit @_reachsumit

May 1

Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing Presents a multivector retrieval system that uses token-aware clustering to allocate centroids based on token frequency & semantic variance. 📝arxiv.org/abs/2604.28142 👨🏽‍💻github.com/TusKANNy/tachiom

810

jonah

jonah

@drexalt

Apr 21

Researchers in Asia have something incredible to wake up to tomorrow, glad I stayed up :D Amazing release. PhD students around the world should rejoice the open dataset, it is really really impressive. Great work goats 🫡

Antoine Chaffin

@antoine_chaffin

Apr 21

The new generation of open state-of-the-art single and multi-vector retrieval models is here It's time, DenseOn with the LateOn 🎶 @LightOnIO releases models that leap past existing ones, and everything you need to do the same!

2,409

jonah

jonah

@drexalt

Apr 8

They even released the base bidirectional models 😍 Great release, thanks for all the checkpoints ♥️

Nicolas Boizard @N1colAIs

Apr 8

🚀 New model family release with an OMNIMODAL version ! After Eurobert, I'm excited to introduce BidirLM, a family of 5 frontier bidirectional encoders including an OMNIMODAL encoder at just 2.5B parameters. 🧵👇 huggingface.co/BidirLM

466

jonah

jonah

@drexalt

Apr 3

Replying to @N1colAIs

alphaxiv.org/abs/2604.02045

jonah

jonah

@drexalt

Mar 12

Whole grains are good for health and for SoTA retrieval 🍞🍞

Mixedbread

@mixedbreadai

Mar 12

Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100 languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.

769

Ben Clavié

jonah retweeted

Ben Clavié

@bclavie

Feb 5

I'm personally very bearish on MUVERA due to its many, many, failure cases, but I have a lot of respect for the @weaviate_io folks, so I gave this another deep read to see if there were things that could change my mind. However this has me a bit puzzled, if I'm reading the graph below right, it means that MUVERA itself produces a ~50 % incompressible performance degradation at commonly used indexing parameters, and still a ~20% degradation at near-bruteforce search tier parameters (ef=1024), meaning that the degradation would be purely due to MUVERA itself. For most retrieval uses, this would make the method completely unusable, as this degradation for many workflows is almost similar to the one we'd experience from replacing semantic search with pure bm25/keyword search. I feel like I'm missing something here so I'm very happy to be corrected if I'm misinterpreting the results!

Femke Plantinga

@femke_plantinga

Feb 4

Multi-vector embeddings (ColBERT, ColPali) are budget killers. But MUVERA can cut your memory footprint by 70%. Multi-vector models offer incredible retrieval but suffer from massive memory overhead and slow indexing. MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) compresses these into single, fixed-dimensional vectors. How it works: MUVERA condenses a sequence of vectors (e.g., 100x96d) into one vector via: 1️⃣ Space Partitioning: Groups vectors into buckets using SimHash or k-means clustering. 2️⃣ Dimensionality Reduction: Applies random linear projection to compress each sub-vector while preserving dot products. 3️⃣ Repetitions: Repeats the process multiple times and concatenates results to improve accuracy. 4️⃣ Final Projection: Optional final compression (not used in Weaviate's implementation). The impact (LoTTE benchmark): - Memory: 12GB → <1GB. - Indexing: 20 mins → 3-6 mins. - HNSW Graph: 99% smaller. There’s a trade-off: You trade a slight dip in raw recall for massive efficiency gains. However, by tuning the HNSW `ef` parameter (e.g., `ef=512`), you can recover 80-90% recall while keeping costs low. When should you use MUVERA? → Large-scale production RAG → Systems where memory/infrastructure costs are the direct bottleneck → Use cases requiring fast indexing MUVERA in @weaviate_io 1.31 takes just a couple of lines of code. You can tune three parameters (k_sim, d_proj, r_reps) to balance memory usage and retrieval accuracy for your specific use case. Read the full technical deep-dive here: weaviate.io/blog/muvera?utm_…

4,476

jonah

jonah

@drexalt

Jan 22

This is a great thread, but there are a few interesting tidbits saved in the full blog post too :) Don't miss out!

Mixedbread

@mixedbreadai

Jan 21

We build the first production ready multi-vector and multimodal search. Now we are serving over 1 billion documents in under 50ms latency (p50). We are sharing how we build it.

286

jonah

jonah

@drexalt

Jan 22

Link to the blog: mixedbread.com/blog/multimod…

Inside Mixedbread: How We Built Multimodal Late-Interaction at Billion Scale

Technical deep-dive into Mixedbread Search - the first production-ready late-interaction search with native multimodality. Learn how we achieve sub-50ms latency on billion-scale document collections.

mixedbread.com

150

jonah

jonah

@drexalt

21 Nov 2025

alias rg="mgrep"

Aamir

@aaxsh18

20 Nov 2025

we just made Claude Code - use 53% fewer tokens - respond 48% faster - give 3.2x better responses just by giving it a better grep

0:15

702

Antoine Chaffin

jonah retweeted

Antoine Chaffin

@antoine_chaffin

15 Nov 2025

Information retrieval folks united at Séoul under beer, chicken and Jensen @raphaelsrty @drexalt 🥰

1,847

jonah

jonah

@drexalt

26 Oct 2025

codex is not amazing at zig unfortunately i am 100x worse

267

jonah

jonah

@drexalt

22 Oct 2025

higher kl-div temperature improves in-distribution but hurts generalization? tracks but annoying

281

jonah

jonah

@drexalt

15 Oct 2025

OpenSearch v3.3 added Seismic (approximate inverted index) into their NeuralSearch plugin, exciting to see the approximate LSR indices proliferate! github.com/opensearch-projec…

Release 3.3.0.0 · opensearch-project/neural-search

Version 3.3.0 Release Notes Compatible with OpenSearch and OpenSearch Dashboards version 3.3.0 Features [SEISMIC] Support SEISMIC, a new sparse ANN algorithm (#1581, #1578, #1577, #1566, #1565, #1...

github.com

435

jonah

jonah

@drexalt

13 Oct 2025

excited and grateful to join @mixedbreadai as an intern time to bake 🥖

5,184

Mixedbread

jonah retweeted

Mixedbread

@mixedbreadai

1 Oct 2025

We love supporting Open Source. All Open Source projects can power their docs, MCP and more with Mixedbread Search for free.

Effect | TypeScript for the AI Era

@EffectTS_

1 Oct 2025

We’ve been running a @mixedbreadai-powered search on the Effect docs for about a month now, and the results speak for themselves: → More relevant results → Fewer “no results” dead ends → Easier discovery of advanced topics Full write-up by @imax153 ⤵️ effect.website/blog/how-mixe…

4,310

jonah

jonah

@drexalt

1 Oct 2025

a pretty crazy native sparse attention repo with very clear triton kernels github.com/mdy666/Scalable-F…

GitHub - mdy666/Scalable-Flash-Native-Sparse-Attention

Contribute to mdy666/Scalable-Flash-Native-Sparse-Attention development by creating an account on GitHub.

github.com

284

jonah

jonah

@drexalt

30 Sep 2025

ok spladev3 is like 40 tokens query, 200 tokens doc on msmarco. why wouldn't you train a splade model to like 40-50 tokens query, 400-600 tokens doc (or even more) and let user top-k the doc embeddings?

333