Hugging Face Daily Papers — 2026-06-11
40 papers worth scanning today, spanning agentic RL, multimodal reasoning, efficient architectures, security, world models, and scientific discovery.
1. Redesign Mixture-of-Experts Routers with Manifold Power Iteration
Highlight: Router is the cornerstone component to the Mixture-of-Experts models. Serving as expert proxies, the rows of the router matrix compute their simila.
arXiv:
arxiv.org/abs/2606.12397
2. Toward Generalist Autonomous Research via Hypothesis-Tree Refinement
Highlight: Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret t.
arXiv:
arxiv.org/abs/2606.11926
3. Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application
Highlight: Environments serve as interactive systems for large language model (LLM) based agents across diverse scenarios and play a crucial role in driving t.
arXiv:
arxiv.org/abs/2606.12191
4. Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks
Highlight: General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-.
arXiv:
arxiv.org/abs/2606.12344
5. Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions
Highlight: Reward models are central to text-to-image post-training, but visual preference is subjective and better represented as a distribution over rubric.
arXiv:
arxiv.org/abs/2606.09076
6. TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders
Highlight: Tabular encoders are usually evaluated inside task-specific end-to-end pipelines, so models from different training paradigms are difficult to comp.
arXiv:
arxiv.org/abs/2606.09323
7. Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning
Highlight: Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existin.
arXiv:
arxiv.org/abs/2606.11683
8. DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch
Highlight: As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebase.
arXiv:
arxiv.org/abs/2606.10728
9. World Pilot: Steering Vision-Language-Action Models with World-Action Priors
Highlight: Vision-Language-Action (VLA) models inherit semantic grounding from large-scale pretraining and perform competently across in-distribution manipula.
arXiv:
arxiv.org/abs/2606.12403
10. On Subquadratic Architectures: From Applications to Principles
Highlight: Transformers dominate modern sequence modeling, but their quadratic attention incurs substantial computational cost. Subquadratic architectures off.
arXiv:
arxiv.org/abs/2606.12364
11. ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics
Highlight: Combinatorics is central to Olympiad-level mathematical problem solving, requiring deep discrete reasoning, creative constructions, and rigorous st.
arXiv:
arxiv.org/abs/2606.10479
12. Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code
Highlight: Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce malicious code. Meanwh.
arXiv:
arxiv.org/abs/2606.11817
13. InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning
Highlight: Recent progress in foundation models has shifted toward agentic behavior involving multi-step reasoning and tool use. However, open-source efforts.
arXiv:
arxiv.org/abs/2606.12195
14. Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling
Highlight: Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL trai.
arXiv:
arxiv.org/abs/2606.12370
15. Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models
Highlight: Vision-language models (VLMs) project images into hundreds to thousands of visual tokens, making decoder inference expensive in both attention comp.
arXiv:
arxiv.org/abs/2606.12412
16. TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning
Highlight: Reinforcement learning with verifiable rewards (RLVR) is a promising approach for enhancing reasoning and agentic behavior in large language models.
arXiv:
arxiv.org/abs/2606.11119
17. ICA Lens: Interpreting Language Models Without Training Another Dictionary
Highlight: Finding interpretable directions in language-model representations is critical for understanding and controlling model behavior. Sparse autoencoder.
arXiv:
arxiv.org/abs/2606.11722
18. EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
Highlight: Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL,.
arXiv:
arxiv.org/abs/2606.03108
19. Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models
Highlight: We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embod.
arXiv:
arxiv.org/abs/2606.11324
20. Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization
Highlight: Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Langu.
arXiv:
arxiv.org/abs/2606.12373
21. World Model Self-Distillation: Training World Models to Solve General Tasks
Highlight: Pretrained video generators are promising visual world models that exhibit emergent task-solving abilities; however, their reliance on detailed tex.
arXiv:
arxiv.org/abs/2606.12072
22. Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency
Highlight: Pipeline parallelism is essential for training large neural networks, but existing schedules trade off throughput, memory, and optimization consist.
arXiv:
arxiv.org/abs/2606.07881
23. i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models
Highlight: Diffusion models have consistently driven progress in text-to-image generation. However, it is challenging to attribute recent progress to specific.
arXiv:
arxiv.org/abs/2606.11289
24. POISE: Position-Aware Undetectable Skill Injection on LLM Agents
Highlight: Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A.
arXiv:
arxiv.org/abs/2606.07943
25. ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction
Highlight: Computer-use agents (CUAs) rely on visual observations of graphical user interfaces, where each screenshot is encoded into a large number of visual.
arXiv:
arxiv.org/abs/2605.11212
26. Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training
Highlight: There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces.
arXiv:
arxiv.org/abs/2606.11854
27. Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation
Highlight: Remaining Useful Life (RUL) prediction is essential for industrial predictive maintenance, yet many learning-based approaches rely on extensive fea.
arXiv:
arxiv.org/abs/2606.11990
28. DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models
Highlight: Many modern vision-language models (VLMs) build on autoregressive decoding of discrete tokens. While text-based output interfaces enable scalable p.
arXiv:
arxiv.org/abs/2606.05758
29. Large Language Models Are Overconfident in Their Own Responses
Highlight: Prior work has shown that instruction-tuned large language models (LLMs) are less well calibrated than their base pre-trained counterparts. However.
arXiv:
arxiv.org/abs/2606.03437
30. Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models
Highlight: Large language models (LLMs) are widely used to tackle complex tasks with autonomous workflows. Recently, reusable natural language skills have eme.
arXiv:
arxiv.org/abs/2606.12203
31. Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay
Highlight: Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource lang.
arXiv:
arxiv.org/abs/2606.11786
32. APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge Operations
Highlight: Generic time-series foundation models transfer poorly to wireless network telemetry whose signals are bursty, zero-inflated, and coupled across pro.
arXiv:
arxiv.org/abs/2606.11553
33. Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs
Highlight: Modern LLM training pipelines increasingly rely on other models to generate data, filter corpora, judge outputs, and guide development decisions. T.
arXiv:
arxiv.org/abs/2606.12385
34. Building Social World Models with Large Language Models
Highlight: Understanding and predicting how social beliefs evolve in response to events -- from policy changes to scientific breakthroughs -- remains a fundam.
arXiv:
arxiv.org/abs/2606.11482
35. Towards Diverse Scientific Hypothesis Search with Large Language Models
Highlight: Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scient.
arXiv:
arxiv.org/abs/2606.10587
36. $τ$-Rec: A Verifiable Benchmark for Agentic Recommender Systems
Highlight: As recommender systems transition toward agentic, multi-turn conversational interfaces, evaluation paradigms have struggled to keep pace. Current b.
arXiv:
arxiv.org/abs/2606.10156
37. FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching
Highlight: Brain Magnetic Resonance Imaging (MRI) plays a central role in studying neurological development, aging, and diseases. One key application is Brain.
arXiv:
arxiv.org/abs/2601.05212
38. SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference
Highlight: Sparse attention reduces compute and memory bandwidth for long-context LLM inference. However, two key challenges remain: (1) KV cache capacity sti.
arXiv:
arxiv.org/abs/2606.04511
39. Can Generalist Agents Automate Data Curation?
Highlight: Curating training data is among the most consequential yet labor-intensive parts of modern AI development: practitioners iteratively propose, imple.
arXiv:
arxiv.org/abs/2606.04261
40. Distilling LLM Feedback for Lean Theorem Proving
Highlight: Post-training for reasoning models typically combines supervised fine-tuning with reinforcement learning from verifiable rewards, most commonly wit.
arXiv:
arxiv.org/abs/2605.30861
Trend summary: ML/LLM training 11, NLP & language agents 9, Vision/multimodal 8, AI reasoning/evaluation 5, Robotics/embodied AI 2, AI security 2, Software agents 1, Social world models 1, Recommender agents 1.