Rohan Jha

Rohan Jha

32 Photos and videos

Tweets

Rohan Jha @Robro612

Jun 11

Blazing fast training kernels and blazing fast (sometimes beating GPU in my exp.) CPU indices? Never been a better time to be a late interactor.

Antoine Chaffin

@antoine_chaffin

Jun 11

Whether you are GPU poor or GPU rich, today's release of PyLate has something for you! GPU maxxers: MaxSim kernels greatly speed up training while lowering the memory requirements CPU enjoyers: TACHIOM enables lightning fast multi-vector indexing and search directly on CPU

1,399

Sumit

Rohan Jha retweeted

Sumit @_reachsumit

Jun 8

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating Introduces a training framework using Pareto-efficient trajectory filtering and Adaptive Reward Gating to make deep research agents more efficient. 📝arxiv.org/abs/2606.07074 👨🏽‍💻github.com/AQ-MedAI/AntAFu-D…

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive...

Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training...

arxiv.org

923

Rohan Jha

Rohan Jha @Robro612

Jun 5

Codex constantly saying "bridge" is beginning to annoy me. "Contract" is next.

989

Sumit

Rohan Jha retweeted

Sumit @_reachsumit

Jun 5

ColBERTSaR: Sparsified ColBERT Index via Product Quantization @EYangTW et al. present an embedding quantization method that turns a ColBERT index into a true inverted index, yielding an index 50-70% smaller than 1-bit PLAID. 📝 arxiv.org/abs/2606.05568 👨🏽‍💻 github.com/hltcoe/ColBERTSaR

ColBERTSaR: Sparsified ColBERT Index via Product Quantization

While ColBERT is an effective neural retrieval architecture, it requires a heavy index structure to support candidate set retrieval based on approximated token embeddings, gathering and...

arxiv.org

4,691

Rohan Jha

Rohan Jha @Robro612

Jun 4

Symbolic tool use can be a useful affordance for reasoners of all sorts - most recently Deontic (permission and rule interpretation & application) Reasoning! I wonder how much a symbolic semantic search tool would mitigate the losses on the weaker / further improve stronger

Guangyao Dou @GuangyaoDou

Jun 4

🚨 New paper: Deontic Reasoning with Agentic Harnesses Can LLMs reason better over long statutes if they interact with rules on demand? We study Deontic Agentic Reasoning (DAR): models use tools like grep, sed to inspect rules and answer rule-grounded questions.

1,019

Eugene Yang

Rohan Jha retweeted

Eugene Yang @EYangTW

Jun 2

RAGTIME is coming back for the 2nd edition! Come join us to evaluate your search agents! We once again feature multilingual and fully human evaluation This is the ultimate way to know whether your systems are doing better

RAGTIME Track 🔎 TREC 2026 @trec_ragtime

Jun 2

🚨 Every major AI lab is racing to build better "deep research" agents — systems that search, synthesize, and report across the web. But how do we actually *benchmark* them? Introducing 🧵 TREC RAGTIME — the shared task for rigorous RAG evaluation. trec-ragtime.github.io/

793

debashish 🦘

Rohan Jha retweeted

debashish 🦘

@derangineer

Jun 3

did someone say late interaction? it’s getting serious now

4,052

Rohan Jha

Rohan Jha @Robro612

Jun 3

Please consider this my announcement / shill warning. Super excited to be @mixedbreadai 🍞 this summer! Lots of cool stuff to be done in the first-stage / agentic retrieval areas!

gautham

@capemox

Jun 3

Replying to @Robro612 @bclavie @mixedbreadai

woah I just noticed you joined mxbai, congrats!

2,735

Rohan Jha

Rohan Jha @Robro612

Jun 2

The more Zipfian distribution of these terms can't be understated. That means this tiny SAE adapter enables decades of BM25 index optimizations, rather than having to step into the new set of (also great) manifold-hypothesis-driven LSR engines designed for SPLADE.

Mixedbread

@mixedbreadai

Jun 2

By now, everyone knows that single-vector embedding models are hugely limiting for modern workflows. But they contain than you think: you can extract sparse Latent Terms from them. And it turns out that BM25 is all you need to turn this vocabulary into a strong retriever.

0:12

3,371

Ben Clavié

Rohan Jha retweeted

Ben Clavié

@bclavie

May 30

Very excited to finally share this one after sitting on it for far too long! It's very topical now. Blog post coming very soon :)

Sumit @_reachsumit

May 29

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies @bclavie et al. extract indexable, BM25-ready sparse features from frozen dense retrievers using reconstruction-trained Sparse Autoencoders. 📝 arxiv.org/abs/2605.29384

13,247

Rohan Jha

Rohan Jha @Robro612

May 29

Anisotropy seems to be the quality-giver (at least confounder) but efficiency-killer. @topk_io identified it as a blocker to SMVE and I'm finding it to block TACHIOM too. Would love to know what the ISO FT consists of @matospiso 🧐

topk.io

@topk_io

May 29

Even strong multi-vector models may break down when optimized for low-latency and high-QPS inference in production. But this can be fixed. We're open-sourcing Iso-ModernColBERT, a late interaction model built for efficient inference and scalable retrieval. 🧵 (1/6)

2,066

Rohan Jha

Rohan Jha @Robro612

May 29

Shameless plug that XTR training mitigates anisotropism/degen score distributions! See modernbert_colbert_kd -> ModernBERT-XTR: same exact training except for XTR vs ColBERT training. Discussed here: arxiv.org/abs/2605.00646

A Replicability Study of XTR

The XTR (conteXtual Token Retrieval) algorithm is a modification to ColBERT retrieval that avoids the costly step of fully gathering and reranking the candidates' embeddings by imputing their...

arxiv.org

154

Sumit

Rohan Jha retweeted

Sumit @_reachsumit

May 29

Latent Terms: Dense Retrievers Contain Trivially Extractable...

We propose Latent Terms, a method revealing that models trained for dense retrieval, whether single- or multi-vector, learn representations that can trivially be decomposed into retrieval-ready...

arxiv.org

22,829

Rohan Jha

Rohan Jha @Robro612

May 29

ICYMI: @raphaelsrty just added index.freeze() to FastPlaid v1.4.7 which halves your size on disk if you know you won’t modify the index 🥶 Reversible with index.unfreeze() 🔥

Rohan Jha @Robro612

May 29

Replying to @antoine_chaffin

The halving of the size of FastPlaid indexes for analytical read-only workloads is real! github.com/lightonai/fast-pl…

1,275

Rohan Jha

Rohan Jha @Robro612

May 29

PLAID's residual gather for full MaxSim is (theoretically) bandwidth limited, not IOPs limited. A single doc's residuals occupy >1 page. So you can't expect to save much time with a clever ordering of document embedding bundles according to token-centroid co-occurrence.

1,143

Rohan Jha

Rohan Jha @Robro612

May 29

Related to this is the general challenge in optimizing vector indices on GPU (in fact single-vec is even fewer FLOPs/bit than MaxSim), we sit well below the roofline, so the arithmetic we do for the final MaxSim hardly contributes to the wall time.

205

Rohan Jha

Rohan Jha @Robro612

May 29

This is why 1) XTR/WARP win big by skipping this load entirely 2) CPU-only versions of PLAID/TACHIOM indices can perform so well, and 3) VecFlow-Chamfer wins by saturating and parallelizing memory reads over its new 900 GB/s GH interconnect, not even needing to compress

198

Rohan Jha

Rohan Jha @Robro612

May 28

linear search over sublinear centroids -> sublinear search over (small) linear centroids 📈

Omar Khattab

@lateinteraction

May 28

this is such an impressive result: search over ~600,000,000 colbert vectors in 10 milliseconds, with a *single* CPU core. and since this algorithm has sub-linear latency, there’s no excuse for anyone up to tens of billions of tokens

391

Silvio Martinico

Rohan Jha retweeted

Silvio Martinico @SilvioMartinico

May 27

Late-interaction retrieval is incredibly powerful, but scaling it is computationally challenging. k-means is a huge bottleneck. Our new architecture, TACHIOM, is fully open-source and tackles this problem: up to 247x faster clustering and 9.8x faster retrieval. ⚡ 🧵👇

120

8,340

Daniel Khashabi 🕊️

Rohan Jha retweeted

Daniel Khashabi 🕊️

@DanielKhashabi

May 26

@ben_vandurme and I are recruiting multiple postdoc fellows at JHU. We're looking for candidates w/ strong record in language models, reasoning, coding agents, and/or AI for science. Interested candidates should send their CV and a brief summary of their research interests to danielk@jhu.edu / vandurme@jhu.edu. Please reshare for visibility. 🙏

8,924