Hugging Face Daily Papers — 2026-06-13
44 papers today. Full list with arXiv links:
1. EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments
Highlight: Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments.
arXiv:
arxiv.org/abs/2606.13681
2. MiniMax Sparse Attention
Highlight: Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memor.
arXiv:
arxiv.org/abs/2606.13392
3. WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces
Highlight: Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line execution, code editing, browsers, an.
arXiv:
arxiv.org/abs/2606.09426
4. SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
Highlight: Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision.
arXiv:
arxiv.org/abs/2606.13673
5. InterleaveThinker: Reinforcing Agentic Interleaved Generation
Highlight: Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. Ho.
arXiv:
arxiv.org/abs/2606.13679
6. MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
Highlight: We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first tra.
arXiv:
arxiv.org/abs/2606.13473
7. Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?
Highlight: Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in visual understanding, yet their performance degrades significantly.
arXiv:
arxiv.org/abs/2606.08063
8. FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents
Highlight: Training deep search agents requires verifiable questions whose answers remain unavailable until sufficient evidence has been acquired through sear.
arXiv:
arxiv.org/abs/2606.12087
9. LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories
Highlight: Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside.
arXiv:
arxiv.org/abs/2606.13578
10. HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers
Highlight: Holistic visual tokenizers are fundamental to unified multimodal models (UMMs) as they map diverse visual inputs into a unified representation spac.
arXiv:
arxiv.org/abs/2606.13289
11. N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization
Highlight: The success of Large Language Models in mathematical reasoning relies heavily on the generation of diverse and valid solution paths during the roll.
arXiv:
arxiv.org/abs/2606.10768
12. EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery
Highlight: LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they.
arXiv:
arxiv.org/abs/2606.13662
13. Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning
Highlight: Latent chain-of-thought compresses reasoning by replacing visible reasoning traces with continuous hidden-state recurrence, but existing formulatio.
arXiv:
arxiv.org/abs/2606.13106
14. VideoMDM: Towards 3D Human Motion Generation From 2D Supervision
Highlight: We introduce VideoMDM, a diffusion-based framework that trains 3D human motion priors directly from accurate 2D poses extracted from monocular vide.
arXiv:
arxiv.org/abs/2606.13364
15. VIA-SD: Verification via Intra-Model Routing for Speculative Decoding
Highlight: Speculative decoding (SD) addresses the high inference costs of LLMs by having lightweight drafters generate candidates for large verifiers to vali.
arXiv:
arxiv.org/abs/2606.12243
16. Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback
Highlight: Despite generating increasingly photorealistic images, text-to-image (T2I) models still exhibit localized, subtle, and structurally complex failure.
arXiv:
arxiv.org/abs/2606.06113
17. From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion
Highlight: Multimodal image fusion aims to integrate complementary information from different modalities into a fused image that preserves rich local details.
arXiv:
arxiv.org/abs/2606.12303
18. MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold
Highlight: We present MoVerse, a real-time video world model that creates an interactively navigable scene from a single narrow-field-of-view image. This sett.
arXiv:
arxiv.org/abs/2606.13376
19. TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search
Highlight: Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central chal.
arXiv:
arxiv.org/abs/2606.11662
20. HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness
Highlight: Large language models are increasingly deployed as agents for long-horizon tasks, yet their performance is shaped not only by model capability and.
arXiv:
arxiv.org/abs/2606.12882
21. Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models
Highlight: Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly.
arXiv:
arxiv.org/abs/2606.11409
22. High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation
Highlight: Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this.
arXiv:
arxiv.org/abs/2606.12575
23. Visual Para-Thinker : A Single-Policy Multi-Agent Framework for Visual Reasoning
Highlight: Visual reasoning requires integrating evidence distributed across regions, attributes, and relations, making single-chain reasoning prone to early.
arXiv:
arxiv.org/abs/2606.09290
24. SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling
Highlight: On-policy distillation (OPD) trains a student on its own trajectories with dense per-token supervision from a stronger teacher, and often outperfor.
arXiv:
arxiv.org/abs/2606.09304
25. Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior
Highlight: Anticipating LLM behavioral tendencies from low-cost psychometric probes is critical for safe deployment, but only if self-reports (SR) reliably pr.
arXiv:
arxiv.org/abs/2606.12730
26. EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge
Highlight: Search Agents -- large language models augmented with search tools -- have intensified the need for future-proof evaluation benchmarks. Existing be.
arXiv:
arxiv.org/abs/2606.13120
27. MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training
Highlight: Representation alignment with pretrained vision models has recently shown strong potential for accelerating diffusion transformer training. By alig.
arXiv:
arxiv.org/abs/2606.08788
28. See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents
Highlight: Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising.
arXiv:
arxiv.org/abs/2606.13594
29. Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents
Highlight: Compact language models (LMs) reduce cost, latency, and deployment risk for tool agents. Yet MCP-style tool use requires more than isolated functio.
arXiv:
arxiv.org/abs/2606.12674
30. MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning
Highlight: Robotic simulators are a cornerstone of modern research in aerial robotics, serving both as a vehicle for the development of new control algorithms.
arXiv:
arxiv.org/abs/2606.08039
31. Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents
Highlight: Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in o.
arXiv:
arxiv.org/abs/2606.13174
32. ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages
Highlight: Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in s.
arXiv:
arxiv.org/abs/2606.13572
33. $\texttt{WEAVER}$, Better, Faster, Longer: An Effective World Model for Robotic Manipulation
Highlight: The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and te.
arXiv:
arxiv.org/abs/2606.13672
34. Surflo: Consistent 3D Surface Flow Model with Global State
Highlight: Geometry is invariant to viewpoint, which makes any collection of images a redundant encoding of a single 3D state. Existing feed-forward reconstru.
arXiv:
arxiv.org/abs/2606.13644
35. WebChallenger: A Reliable and Efficient Generalist Web Agent
Highlight: Autonomous web navigation remains challenging for LLM agents, and the strongest generalist systems rely on proprietary reasoning models whose infer.
arXiv:
arxiv.org/abs/2606.10423
36. Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering
Highlight: We present \textbf{Flash-GMM}, a fused Triton kernel for efficient computation of Gaussian Mixture Models (GMMs) over large-scale data in a single.
arXiv:
arxiv.org/abs/2606.10896
37. IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder
Highlight: Built on pretrained vision foundation models (VFMs), representation autoencoders (RAEs) have recently emerged as a promising approach for construct.
arXiv:
arxiv.org/abs/2606.11096
38. Revisiting Articulated Parts Perception in Robot Manipulation
Highlight: We are surrounded by various objects with movable, articulated parts, e.g., box, handle, door. An accurate and generalizable perception of articula.
arXiv:
arxiv.org/abs/2606.08103
39. The Cold-Start Safety Gap in LLM Agents
Highlight: Are tool-calling LLM agents equally safe throughout a conversation? We discover they are not: agents are most vulnerable at the very start of a ses.
arXiv:
arxiv.org/abs/2606.07867
40. ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs
Highlight: Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approache.
arXiv:
arxiv.org/abs/2606.12451
41. A Stationary (and Therefore Compatible) Representation is All You Need
Highlight: Learning compatible representations aims to learn feature representations that can be used interchangeably over time whenever a model undergoes upd.
arXiv:
arxiv.org/abs/2606.12488
42. PianoKontext: Expressive Performance Rendering from Deadpan Context
Highlight: Expressive performance rendering (EPR) aims to generate realistic performances constrained on sequences of notes. However, flow matching audio edit.
arXiv:
arxiv.org/abs/2606.12282
43. Leveraging Morphology for Historical Script Metrological Analysis
Highlight: Advances in handwritten text recognition have enabled large-scale transcription of historical documents, but still provide limited access to interp.
arXiv:
arxiv.org/abs/2606.09446
44. On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance
Highlight: Large Language Models (LLMs) are increasingly used for zero-shot annotation and LLM-as-a-judge tasks, yet their reliability hinges on how model-int.
arXiv:
arxiv.org/abs/2606.00467
Trend summary:
- Agents / Computer-use / Spatial reasoning: 18
- Multimodal / Vision / Video: 10
- Reasoning / Math / RL: 7
- Other ML methods: 5
- LLMs / Efficient modeling: 3
- Audio / Speech: 1
Papers with code links found: 32/44