Joined May 2020
34 Photos and videos
Pinned Tweet
Releasing ColGREP and LateOn-Code models 🚀 ColGREP is a multi-vector search tool built in Rust made for coding agents. It's an hybrid grep which supports both grep features and semantic retrieval. Run 100% locally. You get two SOTA code retrieval model within ColGREP
7
19
134
10,570
Raphaël Sourty retweeted
TACHIOM is now in PyLate: lightning-fast multi-vector indexing and search directly on CPU! arxiv.org/abs/2604.28142 Joint work with @SilvioMartinico, @cosimorulli1, @rventurini_. Thanks, @antoine_chaffin, and the PyLate team, for the support with the integration!
Whether you are GPU poor or GPU rich, today's release of PyLate has something for you! GPU maxxers: MaxSim kernels greatly speed up training while lowering the memory requirements CPU enjoyers: TACHIOM enables lightning fast multi-vector indexing and search directly on CPU
2
9
545
Raphaël Sourty retweeted
Happy to share that our recent work, TACHIOM, got integrated into the PyLate ecosystem! arxiv.org/pdf/2604.28142 (@SilvioMartinico, @fmnardini, @rventurini_ )

Whether you are GPU poor or GPU rich, today's release of PyLate has something for you! GPU maxxers: MaxSim kernels greatly speed up training while lowering the memory requirements CPU enjoyers: TACHIOM enables lightning fast multi-vector indexing and search directly on CPU
3
11
3,494
PyLate 1.6.0 is available, and improving one release at a time 😁
Whether you are GPU poor or GPU rich, today's release of PyLate has something for you! GPU maxxers: MaxSim kernels greatly speed up training while lowering the memory requirements CPU enjoyers: TACHIOM enables lightning fast multi-vector indexing and search directly on CPU
2
14
822
Computing max similarity (scoring step of colbert, colpali) on gpus can be optimized and this is what @tonywu_71 did. It's available in PyLate, it will accelerate both training and inference of multi-vector models pip install "pylate[lik]" so cool, from @tonywu_71 and @Aurelien_L_
Very excited to release late-interaction-kernels (LIK): fused Triton kernels for MaxSim, the scoring step behind ColBERT, ColPali & LateOn. 🚀 Numerically equivalent to PyTorch at a fraction of the memory, with day-0 support in PyLate & colpali-engine. (1/N 🧵)
3
29
3,293
Start saving money with ColGREP when querying your favorite AI Some estimations I made at the time. Even more relevant now. I heard Uber COO might be interested Happy to see smarter model btw
Anthropic has a coding MOAT
3
22
1,878
Raphaël Sourty retweeted
if you're testing a new retrieval model or long-context LLM, it's a waste of your time (and ours...) to report 0.2% gains on the many saturated and expired benchmarks if you're in that position and looking for way to rescue your great new idea, put it to the test on OBLIQ-Bench
We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.
10
11
171
25,052
Raphaël Sourty retweeted
One of the most interesting papers of the last ~2 years in IR only has 8 citations.
5
9
72
7,116
Raphaël Sourty retweeted
Today I'm launching a new project called SynthTraces 🔥 It is a minimal codebase to generate synthetic coding agent session traces using Pi (from @badlogicgames) I wanted a large number of coding-agent traces, so I built a tiny harness where two models talk to each other: - an open model (served via HF Inference Providers) plays the coding agent. It gets read bash access to a real open source codebase (the huggingface OSS projects) - a small local model (llama.cpp) plays the human user, asking simple questions like "how do I run this?" or "how is CI set up?" The result is more than 2,000 Pi session traces which can be used to train or fine-tune LLMs, and optimize them for Pi 🤯 And ofc everything is published on @huggingface
38
52
355
52,751
Raphaël Sourty retweeted
Do you like the open-source models we keep shipping at @LightOnIO? 👀 Now you can actually *build* with them!! We're launching LightOn Console 🎮: three endpoints (Parse, Extract, Search) so you can run our models on your own documents without building the plumbing yourself! 🧵
2
11
39
1,724
Raphaël Sourty retweeted
Today, we're introducing LightOn Console. ⚙️ Three endpoints: /Parse any documents /Extract structured data /Search enterprise knowledge with citations 🔌 Built-in connectors. MCP-ready. Governance enforced at the chunk level. No infrastructure. No pipeline maintenance. No dedicated retrieval team required. Make your enterprise knowledge agent-readable now! Read the launch announcement: lighton.ai/lighton-blogs/int… Test it now: console.lighton.ai/
1
15
36
2,893
Raphaël Sourty retweeted
The late-interaction multivector retrieval ecosystem is exploding right now. To help separate the signal from the noise, we put together an "Awesome Multivector Retrieval" list organizing the top models, engines, libraries, and datasets all in one place 📚 🧵👇
5
26
118
7,026
Raphaël Sourty retweeted
Quick update: TACHIOM 0.3.0 is out with mean-centering to help alleviate the anisotropy problem. Also noticed that newer models usually need lower micro/small token thresholds than the defaults calibrated on ColBERTv2.0. More to come soon! ⚔️
1
4
21
2,304
Raphaël Sourty retweeted
It’s only BEIR but there are almost 10 points gap between v2 and LateOn We also have good evidence that the model generalize very well outside of BEIR GTE-ModernColBERT was an upgrade LateOn is a whole new generation And all of them have the exact same usage in PyLate
20M downloads / month is a new record for colbertv2 but people should probably migrate from this ancient October 2021 model to the LateOn colbert model from @raphaelsrty @antoine_chaffin et al (@LightOnIO)
3
7
37
5,496
At 140 million parameters, our LateOn model yield strong results 😉 Unrelated to LateOn, I'm really excited by what's happenning with multi-vector models right now - New kind of indexes running on cpu - New multilingual models - Anisotropie being solved - Sparse multi-vector
20M downloads / month is a new record for colbertv2 but people should probably migrate from this ancient October 2021 model to the LateOn colbert model from @raphaelsrty @antoine_chaffin et al (@LightOnIO)
4
43
4,589
Raphaël Sourty retweeted
Very excited to finally share this one after sitting on it for far too long! It's very topical now. Blog post coming very soon :)
Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies @bclavie et al. extract indexable, BM25-ready sparse features from frozen dense retrievers using reconstruction-trained Sparse Autoencoders. 📝 arxiv.org/abs/2605.29384
9
15
88
13,247
Raphaël Sourty retweeted
Late-interaction sparse retrieval? 😁 With neuron-level inverted indexing, on top of unsupervised sparse autoencoders. Works much better than directly training sparse retrievers. Lots of cool ideas developed & composed in here. Thanks for the insights @Veritas2026 @yifeiwang77!
No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval @Veritas2026 et al. replace vector clustering with efficient sparse autoencoders & natural inverted indexing to accelerate multi-vector retrieval. 📝arxiv.org/abs/2605.30120 👨🏽‍💻github.com/Y-Research-SBU/SS…
10
16
177
28,695
I want an Iso-LateOn as well 😁 Very interesting work to scale multi-vector retrieval and fight anisotropism in models so it can produce sparse vectors for SMVE
May 29
Even strong multi-vector models may break down when optimized for low-latency and high-QPS inference in production. But this can be fixed. We're open-sourcing Iso-ModernColBERT, a late interaction model built for efficient inference and scalable retrieval. 🧵 (1/6)
2
1
21
1,341
Raphaël Sourty retweeted
May 29
Even strong multi-vector models may break down when optimized for low-latency and high-QPS inference in production. But this can be fixed. We're open-sourcing Iso-ModernColBERT, a late interaction model built for efficient inference and scalable retrieval. 🧵 (1/6)
1
9
56
10,412
Raphaël Sourty retweeted
ICYMI: @raphaelsrty just added index.freeze() to FastPlaid v1.4.7 which halves your size on disk if you know you won’t modify the index 🥶 Reversible with index.unfreeze() 🔥
Replying to @antoine_chaffin
The halving of the size of FastPlaid indexes for analytical read-only workloads is real! github.com/lightonai/fast-pl…
1
3
16
1,275
Raphaël Sourty retweeted
📢 New @heyjasper release ! 📢 MONET 🌸 : An Apache2.0 deduped and recaptioned dataset of 105M samples unlocking reproducible text-to-image research. Nano T2I 🖌️ : A codebase to train your own T2I model 🤗 @huggingface: huggingface.co/datasets/jasp… 💻: github.com/gojasper/nano-t2i Very excited about this new release, pushing the boundaries of open and reproducible T2I research. Congrats to the team! Benjamin Aubin Gonzalo Quintana @onurxtasar @UlaLaParis @_jeev2 @dh7net @clipdropapp @heyjasperai
9
33
116
45,179