Sumit

Sumit

1,682 Photos and videos

Tweets

Pinned Tweet

Sumit @_reachsumit

8 Oct 2025

In the final post of the Adaptive RAG series, we explore how to treat selective retrieval as a core, learned skill, moving from passive observation to active, intelligent decision-making. blog.reachsumit.com/posts/20…

Teaching Models to Decide When to Retrieve: Adaptive RAG, Part 4

This final post of the Adaptive RAG series explores methods that treat adaptive retrieval as a learned skill and explicitly teach models when to retrieve. We examine three paradigms in increasing...

blog.reachsumit.com

7,856

Sumit

Sumit @_reachsumit

Jun 12

Reasoning with Memory: Adaptive Information Management for Retrieval-Augmented Generation Amazon presents a RAG framework with an explicit working memory managed by a trainable extractor that filters and consolidates information across reasoning steps. 📝amazon.science/publications/…

Reasoning with memory: Adaptive information management for retrieval-augmented generation

Multi-hop reasoning remains a fundamental challenge for Retrieval-Augmented Generation (RAG) systems. Recent approaches-from adaptive retrieval to agentic pipelines-struggle to maintain coherent...

amazon.science

796

Sumit

Sumit @_reachsumit

Jun 12

LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling Meituan introduces a challenging search agent benchmark built from a knowledge graph of over 7 million Wikipedia entities, systematically maximizing search space. 📝 arxiv.org/abs/2606.12837

LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the...

Search agent benchmarks exemplified by BrowseComp have rapidly saturated over the past year, with the strongest models surpassing 90% accuracy. Since these benchmarks are predominantly...

arxiv.org

706

Sumit

Sumit @_reachsumit

Jun 12

DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks Introduces an open-ended benchmark for evaluating search agents on everyday search tasks, using cascade rubrics. 📝 arxiv.org/abs/2606.12871 👨🏽‍💻 github.com/AGI-Eval-Official…

DailyReport: An Open-ended Benchmark for Evaluating Search Agents...

Search Agents (SAs) typically leverage large language models (LLMs) to support complex information-seeking tasks by autonomously exploring web sources and synthesizing information into...

arxiv.org

423

Sumit

Sumit @_reachsumit

Jun 12

Iterating Toward Better Search: A Two-Agent Simulation Framework for Evaluating Agentic Search Architectures in E-Commerce eBay presents a modular two-agent simulation framework that pairs a configurable buyer agent with interchangeable responders. 📝 arxiv.org/abs/2606.12924

Iterating Toward Better Search: A Two-Agent Simulation Framework...

We present a modular two-agent simulation framework for evaluating conversational shopping assistant architectures. An independent buyer agent, configured with personas, missions, and patience...

arxiv.org

255

Sumit

Sumit @_reachsumit

Jun 12

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge Tencent introduces an evolving benchmark of 400 English and 400 Chinese contamination-free questions, designed to prevent parametric memorization. 📝 arxiv.org/abs/2606.13120 🤗 huggingface.co/datasets/Krys…

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Search Agents -- large language models augmented with search tools -- have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static...

arxiv.org

275

Sumit

Sumit @_reachsumit

Jun 12

How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation @ChaseF1 et al. find optimal granularity to vary question characteristics when building synthetic RAG eval benchmarks. 📝 arxiv.org/abs/2606.12789 👨🏽‍💻 github.com/fensorechase/rag-…

How Fine-Grained Should a RAG Benchmark Be? A Hierarchical...

Evaluating retrieval-augmented generation (RAG) systems requires benchmarks that capture diverse question characteristics, yet practitioners lack empirical guidance on which dimensions to vary and...

arxiv.org

551

Sumit

Sumit @_reachsumit

Jun 12

OneRetrieval: Unifying Multi-Branch E-commerce Retrieval with an Editable Generative Model Kuaishou unifies multi-branch e-commerce retrieval into one generative model that keeps the inverted index's real-time editability. 📝 arxiv.org/abs/2606.13533 👨🏽‍💻 github.com/xuxinzhang/oneret…

OneRetrieval: Unifying Multi-Branch E-commerce Retrieval with an...

Industrial e-commerce search serves hundreds of millions of items through a multi-branch retrieval stage fused by hand-tuned merging without joint optimization. Generative retrieval (GR) raises...

arxiv.org

352

Sumit

Sumit @_reachsumit

Jun 12

When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval @tongyao_zhu et al. show that interpolating monolingual query embeddings often beats the best single-language query. 📝 arxiv.org/abs/2606.13537 👨🏽‍💻 github.com/tongyao-zhu/query…

When Does Mixing Help? Analyzing Query Embedding Interpolation in...

While mixed-language querying is ubiquitous in multilingual communities, the sensitivity of dense retrievers to such queries remains poorly understood. We present a ratio-controlled study on...

arxiv.org

617

Sumit

Sumit @_reachsumit

Jun 12

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning Meta presents a post-training framework that teaches LLMs to reason by analogy, retrieving examples by reasoning utility rather than semantic similarity. 📝arxiv.org/abs/2606.13680

Learning to Reason by Analogy via Retrieval-Augmented...

Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is...

arxiv.org

484

Sumit

Sumit @_reachsumit

Jun 11

TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search Introduces an inference-time framework that organizes deep search as branch-and-return search over tree-structured states. 📝 arxiv.org/abs/2606.11662

TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search

Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several...

arxiv.org

518

Sumit

Sumit @_reachsumit

Jun 11

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents Introduces a framework for synthesizing shortcut-resistant training data that forces genuine multi-step search. 📝 arxiv.org/abs/2606.12087 👨🏽‍💻 github.com/RUCAIBox/FORT-Sea…

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for...

Training deep search agents requires verifiable questions whose answers remain unavailable until sufficient evidence has been acquired through search. Existing synthesis methods often increase...

arxiv.org

797

Sumit

Sumit @_reachsumit

Jun 11

The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content Identifies a "structural attention tax" where knowledge graph triples capture 2–3× more attention than equivalent natural-language text. 📝 arxiv.org/abs/2606.11198

The Structural Attention Tax: How Retrieval Format Hijacks...

Retrieval-augmented generation (RAG) systems inject external knowledge to improve LLM outputs, yet the format of injected content -- distinct from its semantic relevance -- can independently...

arxiv.org

520

Sumit

Sumit @_reachsumit

Jun 11

CompRank: Efficient LLM Reranking via Token-Level Compression and Decoding-Free Scoring Introduces a token-efficient reranking framework that decouples document representations and uses attention-based scoring, retaining only ~10% of document tokens. 📝 arxiv.org/abs/2606.11700

CompRank: Efficient LLM Reranking via Token-Level Compression and...

Large language model (LLM) rerankers have become an important component of modern retrieval and retrieval-augmented generation pipelines, but their high computational cost limits their...

arxiv.org

345

Sumit

Sumit @_reachsumit

Jun 11

What Limits Does Quantization Place on Dense Top-k Retrieval? A Theoretical Study Presents a theoretical study showing that the corpus-independent O(k) embedding bound breaks under quantization. 📝 arxiv.org/abs/2606.11780

What Limits Does Quantization Place on Dense Top-$k$ Retrieval? A...

We establish conditions for embedding a corpus of $N$ documents as $d$-dimensional vectors such that every $k$-subset $S \subseteq [N]$ is realizable as a result of top-$k$ retrieval by some query...

arxiv.org

679

Sumit

Sumit @_reachsumit

Jun 11

LLM-Based User Personas for Recommendations at Scale Google generates real-time, natural-language user interest personas during serving, balancing interest summarization with novel-topic exploration via KD and asynchronous inference. 📝 arxiv.org/abs/2606.12198

LLM-Based User Personas for Recommendations at Scale

Large Language Models (LLMs) offer unprecedented potential for enhancing recommendation systems through their world knowledge and reasoning capabilities. However, existing approaches often rely on...

arxiv.org

854

Sumit

Sumit @_reachsumit

Jun 11

Doc-to-Atom: Learning to Compile and Compose Memory Atoms Introduces a compositional parametric memory framework that decomposes each document into semantically typed knowledge atoms. 📝 arxiv.org/abs/2606.12400

Doc-to-Atom: Learning to Compile and Compose Memory Atoms

Long input sequences are central to document understanding and multi-step reasoning in Large Language Models, yet the quadratic cost of attention makes inference both memory-intensive and slow....

arxiv.org

502

Sumit

Sumit @_reachsumit

Jun 10

SIDInspector: A Mapping-First Diagnostic Resource for Semantic-ID Tokenizers Huawei presents a tool to validate and profile semantic-ID tokenizer mappings before generator training. 📝 arxiv.org/abs/2606.10375 👨🏽‍💻 github.com/jdding/sidinspect…

SIDInspector: A Mapping-First Diagnostic Resource for Semantic-ID...

Semantic-ID (\sid) tokenizers are increasingly reused as standalone artifacts in generative recommendation: an exported item-to-code mapping becomes the address space that a later sequence...

arxiv.org

591

Sumit

Sumit @_reachsumit

Jun 10

STORM: Stepwise Token Optimization with Reward-Guided Beam Search Trains LLM query rewriters with reward-guided beam search, turning retrieval rewards into a token-level signal so BM25-based lexical retrieval rivals dense retrievers at much lower cost. 📝arxiv.org/abs/2606.10621

STORM: Stepwise Token Optimization with Reward-Guided Beam Search

Modern retrieval increasingly relies on dense and learned-sparse neural models that are effective but require encoding the entire corpus into a specialized index, rebuilt whenever the model...

arxiv.org

3,476

Sumit

Sumit @_reachsumit

Jun 10

Effective Reinforcement Learning for Agentic Search by Recycling Zero-Variance Queries During Training Proposes recycling zero-variance rollout groups back into the RL training pool, letting a 1.7B search agent match or surpass 7B systems. 📝arxiv.org/abs/2606.10709

Effective Reinforcement Learning for Agentic Search by Recycling...

The use of GRPO-style algorithms has become the standard strategy for training LLM search agents under outcome-only rewards. With these algorithms, a query contributes to parameter updates only...

arxiv.org

351

Sumit

Sumit @_reachsumit

Jun 10

miniReranker: Efficient Multimodal Reranking through Visual Cache Reuse and Interaction Sparsity Speeds up multimodal LLM reranking via vision-first prompting, early exit, and visual token pruning, cutting reranking runtime by up to 99%. 📝 arxiv.org/abs/2606.10759

miniReranker: Efficient Multimodal Reranking through Visual Cache...

Multimodal large language models (MLLMs) have recently shown strong potential as point-wise rerankers by directly modeling query--document relevance through next-token prediction. However,...

arxiv.org

648