Joined April 2010
1,682 Photos and videos
Pinned Tweet
8 Oct 2025
In the final post of the Adaptive RAG series, we explore how to treat selective retrieval as a core, learned skill, moving from passive observation to active, intelligent decision-making. blog.reachsumit.com/posts/20…
1
1
17
7,856
Reasoning with Memory: Adaptive Information Management for Retrieval-Augmented Generation Amazon presents a RAG framework with an explicit working memory managed by a trainable extractor that filters and consolidates information across reasoning steps. πŸ“amazon.science/publications/…
2
2
23
796
LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling Meituan introduces a challenging search agent benchmark built from a knowledge graph of over 7 million Wikipedia entities, systematically maximizing search space. πŸ“ arxiv.org/abs/2606.12837
1
12
706
DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks Introduces an open-ended benchmark for evaluating search agents on everyday search tasks, using cascade rubrics. πŸ“ arxiv.org/abs/2606.12871 πŸ‘¨πŸ½β€πŸ’» github.com/AGI-Eval-Official…
2
13
423
Iterating Toward Better Search: A Two-Agent Simulation Framework for Evaluating Agentic Search Architectures in E-Commerce eBay presents a modular two-agent simulation framework that pairs a configurable buyer agent with interchangeable responders. πŸ“ arxiv.org/abs/2606.12924
1
255
EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge Tencent introduces an evolving benchmark of 400 English and 400 Chinese contamination-free questions, designed to prevent parametric memorization. πŸ“ arxiv.org/abs/2606.13120 πŸ€— huggingface.co/datasets/Krys…
2
275
How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation @ChaseF1 et al. find optimal granularity to vary question characteristics when building synthetic RAG eval benchmarks. πŸ“ arxiv.org/abs/2606.12789 πŸ‘¨πŸ½β€πŸ’» github.com/fensorechase/rag-…
15
551
OneRetrieval: Unifying Multi-Branch E-commerce Retrieval with an Editable Generative Model Kuaishou unifies multi-branch e-commerce retrieval into one generative model that keeps the inverted index's real-time editability. πŸ“ arxiv.org/abs/2606.13533 πŸ‘¨πŸ½β€πŸ’» github.com/xuxinzhang/oneret…
5
352
When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval @tongyao_zhu et al. show that interpolating monolingual query embeddings often beats the best single-language query. πŸ“ arxiv.org/abs/2606.13537 πŸ‘¨πŸ½β€πŸ’» github.com/tongyao-zhu/query…
14
617
Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning Meta presents a post-training framework that teaches LLMs to reason by analogy, retrieving examples by reasoning utility rather than semantic similarity. πŸ“arxiv.org/abs/2606.13680
12
484
TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search Introduces an inference-time framework that organizes deep search as branch-and-return search over tree-structured states. πŸ“ arxiv.org/abs/2606.11662
14
518
FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents Introduces a framework for synthesizing shortcut-resistant training data that forces genuine multi-step search. πŸ“ arxiv.org/abs/2606.12087 πŸ‘¨πŸ½β€πŸ’» github.com/RUCAIBox/FORT-Sea…
1
14
797
The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content Identifies a "structural attention tax" where knowledge graph triples capture 2–3Γ— more attention than equivalent natural-language text. πŸ“ arxiv.org/abs/2606.11198
1
5
520
CompRank: Efficient LLM Reranking via Token-Level Compression and Decoding-Free Scoring Introduces a token-efficient reranking framework that decouples document representations and uses attention-based scoring, retaining only ~10% of document tokens. πŸ“ arxiv.org/abs/2606.11700
5
345
What Limits Does Quantization Place on Dense Top-k Retrieval? A Theoretical Study Presents a theoretical study showing that the corpus-independent O(k) embedding bound breaks under quantization. πŸ“ arxiv.org/abs/2606.11780
1
13
679
LLM-Based User Personas for Recommendations at Scale Google generates real-time, natural-language user interest personas during serving, balancing interest summarization with novel-topic exploration via KD and asynchronous inference. πŸ“ arxiv.org/abs/2606.12198
3
18
854
Doc-to-Atom: Learning to Compile and Compose Memory Atoms Introduces a compositional parametric memory framework that decomposes each document into semantically typed knowledge atoms. πŸ“ arxiv.org/abs/2606.12400
2
12
502
SIDInspector: A Mapping-First Diagnostic Resource for Semantic-ID Tokenizers Huawei presents a tool to validate and profile semantic-ID tokenizer mappings before generator training. πŸ“ arxiv.org/abs/2606.10375 πŸ‘¨πŸ½β€πŸ’» github.com/jdding/sidinspect…
1
7
591
STORM: Stepwise Token Optimization with Reward-Guided Beam Search Trains LLM query rewriters with reward-guided beam search, turning retrieval rewards into a token-level signal so BM25-based lexical retrieval rivals dense retrievers at much lower cost. πŸ“arxiv.org/abs/2606.10621
2
10
3,476
Effective Reinforcement Learning for Agentic Search by Recycling Zero-Variance Queries During Training Proposes recycling zero-variance rollout groups back into the RL training pool, letting a 1.7B search agent match or surpass 7B systems. πŸ“arxiv.org/abs/2606.10709
1
351
miniReranker: Efficient Multimodal Reranking through Visual Cache Reuse and Interaction Sparsity Speeds up multimodal LLM reranking via vision-first prompting, early exit, and visual token pruning, cutting reranking runtime by up to 99%. πŸ“ arxiv.org/abs/2606.10759
1
1
18
648