Raphaël Sourty

Raphaël Sourty

34 Photos and videos

Tweets

Pinned Tweet

Raphaël Sourty

@raphaelsrty

Feb 12

Releasing ColGREP and LateOn-Code models 🚀 ColGREP is a multi-vector search tool built in Rust made for coding agents. It's an hybrid grep which supports both grep features and semantic retrieval. Run 100% locally. You get two SOTA code retrieval model within ColGREP

134

10,570

Franco Maria Nardini

Raphaël Sourty retweeted

Franco Maria Nardini @fmnardini

Jun 11

TACHIOM is now in PyLate: lightning-fast multi-vector indexing and search directly on CPU! arxiv.org/abs/2604.28142 Joint work with @SilvioMartinico, @cosimorulli1, @rventurini_. Thanks, @antoine_chaffin, and the PyLate team, for the support with the integration!

Efficient Multivector Retrieval with Token-Aware Clustering and...

Multivector retrieval models achieve state-of-the-art effectiveness through fine-grained token-level representations, but their deployment incurs substantial computational and memory costs....

arxiv.org

Antoine Chaffin

@antoine_chaffin

Jun 11

Whether you are GPU poor or GPU rich, today's release of PyLate has something for you! GPU maxxers: MaxSim kernels greatly speed up training while lowering the memory requirements CPU enjoyers: TACHIOM enables lightning fast multi-vector indexing and search directly on CPU

545

cosimorulli

Raphaël Sourty retweeted

cosimorulli @cosimorulli1

Jun 11

Happy to share that our recent work, TACHIOM, got integrated into the PyLate ecosystem! arxiv.org/pdf/2604.28142 (@SilvioMartinico, @fmnardini, @rventurini_ )

Antoine Chaffin

@antoine_chaffin

Jun 11

3,494

Raphaël Sourty

Raphaël Sourty

@raphaelsrty

Jun 11

PyLate 1.6.0 is available, and improving one release at a time 😁

Antoine Chaffin

@antoine_chaffin

Jun 11

822

Raphaël Sourty

Raphaël Sourty

@raphaelsrty

Jun 10

Computing max similarity (scoring step of colbert, colpali) on gpus can be optimized and this is what @tonywu_71 did. It's available in PyLate, it will accelerate both training and inference of multi-vector models pip install "pylate[lik]" so cool, from @tonywu_71 and @Aurelien_L_

Tony Wu

@tonywu_71

Jun 10

Very excited to release late-interaction-kernels (LIK): fused Triton kernels for MaxSim, the scoring step behind ColBERT, ColPali & LateOn. 🚀 Numerically equivalent to PyTorch at a fraction of the memory, with day-0 support in PyLate & colpali-engine. (1/N 🧵)

3,293

Raphaël Sourty

Raphaël Sourty

@raphaelsrty

Jun 9

Start saving money with ColGREP when querying your favorite AI Some estimations I made at the time. Even more relevant now. I heard Uber COO might be interested Happy to see smarter model btw

Lisan al Gaib

@scaling01

Jun 9

Anthropic has a coding MOAT

1,878

Omar Khattab

Raphaël Sourty retweeted

Omar Khattab

@lateinteraction

Jun 1

if you're testing a new retrieval model or long-context LLM, it's a waste of your time (and ours...) to report 0.2% gains on the many saturated and expired benchmarks if you're in that position and looking for way to rescue your great new idea, put it to the test on OBLIQ-Bench

Diane @dianetc_

May 6

We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.

171

25,052

Ben Clavié

Raphaël Sourty retweeted

Ben Clavié

@bclavie

Jun 5

Ben Clavié

@bclavie

Jun 4

One of the most interesting papers of the last ~2 years in IR only has 8 citations.

7,116

Julien Chaumond

Raphaël Sourty retweeted

Julien Chaumond

@julien_c

Jun 4

Today I'm launching a new project called SynthTraces 🔥 It is a minimal codebase to generate synthetic coding agent session traces using Pi (from @badlogicgames) I wanted a large number of coding-agent traces, so I built a tiny harness where two models talk to each other: - an open model (served via HF Inference Providers) plays the coding agent. It gets read bash access to a real open source codebase (the huggingface OSS projects) - a small local model (llama.cpp) plays the human user, asking simple questions like "how do I run this?" or "how is CI set up?" The result is more than 2,000 Pi session traces which can be used to train or fine-tune LLMs, and optimize them for Pi 🤯 And ofc everything is published on @huggingface ✅

355

52,751

Amélie Chatelain

Raphaël Sourty retweeted

Amélie Chatelain

@AmelieTabatta

Jun 3

Do you like the open-source models we keep shipping at @LightOnIO? 👀 Now you can actually *build* with them!! We're launching LightOn Console 🎮: three endpoints (Parse, Extract, Search) so you can run our models on your own documents without building the plumbing yourself! 🧵

1,724

LightOn

Raphaël Sourty retweeted

LightOn

@LightOnIO

Jun 2

Today, we're introducing LightOn Console. ⚙️ Three endpoints: /Parse any documents /Extract structured data /Search enterprise knowledge with citations 🔌 Built-in connectors. MCP-ready. Governance enforced at the chunk level. No infrastructure. No pipeline maintenance. No dedicated retrieval team required. Make your enterprise knowledge agent-readable now! Read the launch announcement: lighton.ai/lighton-blogs/int… Test it now: console.lighton.ai/

2,893

Silvio Martinico

Raphaël Sourty retweeted

Silvio Martinico @SilvioMartinico

Jun 2

The late-interaction multivector retrieval ecosystem is exploding right now. To help separate the signal from the noise, we put together an "Awesome Multivector Retrieval" list organizing the top models, engines, libraries, and datasets all in one place 📚 🧵👇

118

7,026

Silvio Martinico

Raphaël Sourty retweeted

Silvio Martinico @SilvioMartinico

May 31

Quick update: TACHIOM 0.3.0 is out with mean-centering to help alleviate the anisotropy problem. Also noticed that newer models usually need lower micro/small token thresholds than the defaults calibrated on ColBERTv2.0. More to come soon! ⚔️

2,304

Antoine Chaffin

Raphaël Sourty retweeted

Antoine Chaffin

@antoine_chaffin

May 30

It’s only BEIR but there are almost 10 points gap between v2 and LateOn We also have good evidence that the model generalize very well outside of BEIR GTE-ModernColBERT was an upgrade LateOn is a whole new generation And all of them have the exact same usage in PyLate

Omar Khattab

@lateinteraction

May 30

20M downloads / month is a new record for colbertv2 but people should probably migrate from this ancient October 2021 model to the LateOn colbert model from @raphaelsrty @antoine_chaffin et al (@LightOnIO)

5,496

Raphaël Sourty

Raphaël Sourty

@raphaelsrty

May 30

At 140 million parameters, our LateOn model yield strong results 😉 Unrelated to LateOn, I'm really excited by what's happenning with multi-vector models right now - New kind of indexes running on cpu - New multilingual models - Anisotropie being solved - Sparse multi-vector

Omar Khattab

@lateinteraction

May 30

4,589

Ben Clavié

Raphaël Sourty retweeted

Ben Clavié

@bclavie

May 30

Very excited to finally share this one after sitting on it for far too long! It's very topical now. Blog post coming very soon :)

Sumit @_reachsumit

May 29

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies @bclavie et al. extract indexable, BM25-ready sparse features from frozen dense retrievers using reconstruction-trained Sparse Autoencoders. 📝 arxiv.org/abs/2605.29384

13,247

Omar Khattab

Raphaël Sourty retweeted

Omar Khattab

@lateinteraction

May 30

Late-interaction sparse retrieval? 😁 With neuron-level inverted indexing, on top of unsupervised sparse autoencoders. Works much better than directly training sparse retrievers. Lots of cool ideas developed & composed in here. Thanks for the insights @Veritas2026 @yifeiwang77!

Sumit @_reachsumit

May 29

No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval @Veritas2026 et al. replace vector clustering with efficient sparse autoencoders & natural inverted indexing to accelerate multi-vector retrieval. 📝arxiv.org/abs/2605.30120 👨🏽‍💻github.com/Y-Research-SBU/SS…

177

28,695

Raphaël Sourty

Raphaël Sourty

@raphaelsrty

May 29

I want an Iso-LateOn as well 😁 Very interesting work to scale multi-vector retrieval and fight anisotropism in models so it can produce sparse vectors for SMVE

topk.io

@topk_io

May 29

Even strong multi-vector models may break down when optimized for low-latency and high-QPS inference in production. But this can be fixed. We're open-sourcing Iso-ModernColBERT, a late interaction model built for efficient inference and scalable retrieval. 🧵 (1/6)

1,341

topk.io

Raphaël Sourty retweeted

topk.io

@topk_io

May 29

10,412

Rohan Jha

Raphaël Sourty retweeted

Rohan Jha @Robro612

May 29

ICYMI: @raphaelsrty just added index.freeze() to FastPlaid v1.4.7 which halves your size on disk if you know you won’t modify the index 🥶 Reversible with index.unfreeze() 🔥

Rohan Jha @Robro612

May 29

Replying to @antoine_chaffin

The halving of the size of FastPlaid indexes for analytical read-only workloads is real! github.com/lightonai/fast-pl…

1,275

Clément Chadebec

Raphaël Sourty retweeted

Clément Chadebec

@CChadebec

May 28

📢 New @heyjasper release ! 📢 MONET 🌸 : An Apache2.0 deduped and recaptioned dataset of 105M samples unlocking reproducible text-to-image research. Nano T2I 🖌️ : A codebase to train your own T2I model 🤗 @huggingface: huggingface.co/datasets/jasp… 💻: github.com/gojasper/nano-t2i Very excited about this new release, pushing the boundaries of open and reproducible T2I research. Congrats to the team! Benjamin Aubin Gonzalo Quintana @onurxtasar @UlaLaParis @_jeev2 @dh7net @clipdropapp @heyjasperai

0:27

116

45,179