🔬 HuggingFace Daily Papers — June 8, 2026
46 papers featured today. Here's every single one with highlights. A thread on the most exciting trends in AI research right now. 🧵
━━━━━━━━━━━━━━━━━━━
1️⃣ **Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings**
💡 Shows that the unembedding matrix of LLMs can serve as a powerful feature lens for text embeddings — a simple yet effective approach to improve embedding quality without extra training.
📄
arxiv.org/abs/2606.07502
2️⃣ **SoCRATES: Reliable Automated Evaluation of Proactive LLM Mediation**
💡 First comprehensive evaluation framework for LLM mediators that accounts for real-time trajectories, shifting emotions, and socio-cognitive variations across disputants.
📄
arxiv.org/abs/2606.05563
3️⃣ **GENEB: Why Genomic Models Are Hard to Compare**
💡 Exposes the fragmented benchmarks, incompatible protocols, and task-specific reporting that make genomic model comparison unreliable — proposes a unified evaluation approach.
📄
arxiv.org/abs/2606.04525
4️⃣ **MMAE: A Massive Multitask Audio Editing Benchmark**
💡 First comprehensive evaluation testbed for general-purpose instruction-based audio editing, covering diverse editing tasks and scenarios.
📄
arxiv.org/abs/2606.07229
5️⃣ **AnchorWorld: Embodied Egocentric World Simulation**
💡 From Kling Team — an interactive world modeling framework with view-based evolution customization for egocentric perspectives.
📄
arxiv.org/abs/2606.07326
6️⃣ **Direct 3D-Aware Object Insertion via Decomposed Visual Proxies**
💡 Moves beyond diffusion-based object insertion by using decomposed visual proxies for true 3D-aware compositing.
📄
arxiv.org/abs/2606.06601
7️⃣ **Robots Need More than VLA and World Models**
💡 A thought-provoking position paper arguing that scaling VLA models alone won't achieve generalist robot intelligence — structural innovation is needed.
📄
arxiv.org/abs/2606.06556
8️⃣ **When Tools Fail: Benchmarking Dynamic Replanning in LLM Agents**
💡 ToolMaze — first benchmark for evaluating LLM agents' ability to handle real-world tool failures, dynamic replanning, and anomaly recovery.
📄
arxiv.org/abs/2606.05806
9️⃣ **OpenSkill: Open-World Self-Evolution for LLM Agents**
💡 Enables LLM agents to self-evolve in open worlds without requiring curated skills, successful trajectories, or verifier signals.
📄
arxiv.org/abs/2606.06741
🔟 **SubtleMemory: Fine-Grained Relational Memory in Long-Horizon AI Agents**
💡 A benchmark testing AI agents' ability to handle nuanced memory relationships — complementary, contradictory, and context-dependent — in long-term interactions.
📄
arxiv.org/abs/2606.05761
1️⃣1️⃣ **UniSHARP: Universal Sharp Monocular View Synthesis**
💡 Extends SHARP for universal monocular rendering across perspective, fisheye, and omnidirectional cameras via unified omnidirectional latent space alignment.
📄
arxiv.org/abs/2606.07514
1️⃣2️⃣ **UnpredictaBench: Evaluating Distributional Randomness in LLMs**
💡 Tests LLMs' ability to sample from target distributions — no model exceeds 40% on KS@100, showing massive headroom in distributional simulation.
📄
arxiv.org/abs/2606.06622
1️⃣3️⃣ **LIMMT: Less is More for Motion Tracking**
💡 First data-centric study for physics-based humanoid motion tracking — training with under 3% of AMASS outperforms full dataset training.
📄
arxiv.org/abs/2606.06953
1️⃣4️⃣ **Watch, Remember, Reason: Human-View Video Understanding with MLLMs**
💡 A unified framework organizing video MLLM capabilities into watching (perception), remembering (memory), and reasoning — with comprehensive survey of the field.
📄
arxiv.org/abs/2606.07433
1️⃣5️⃣ **dots.tts Technical Report**
💡 Xiaomi's HiLab releases a 2B-param continuous autoregressive TTS model achieving SOTA on Seed-TTS-Eval — open-sourced with Apache 2.0, supports streaming at 85ms latency.
📄
arxiv.org/abs/2606.07080
1️⃣6️⃣ **LLM Explainability with Counterfactual Chains and Causal Graphs**
💡 Uses causal graphs to model LLM inference itself, providing transparent visualization of how models organize high-level concepts through MCMC-inspired counterfactual augmentation.
📄
arxiv.org/abs/2606.05972
1️⃣7️⃣ **Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators**
💡 Astra framework couples RL-trained VLM policy with a world simulator — agents acquire imagined visual evidence during reasoning for spatial tasks.
📄
arxiv.org/abs/2606.06476
1️⃣8️⃣ **Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them**
💡 Surprising finding: 2-step I2V generation has better physical consistency than 50-step! PhaseLock preserves motion priors via Latent Delta Guidance with negligible overhead.
📄
arxiv.org/abs/2606.06361
1️⃣9️⃣ **PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams**
💡 A three-stage framework (Profile → Recommend → Adapt) for longitudinal scientific paper recommendation with interest drift modeling.
📄
arxiv.org/abs/2606.07454
2️⃣0️⃣ **Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills**
💡 Closed-loop self-evolution for SWE agents — distills solving traces into structured skills to generate targeted training tasks. Reaches 50.40% on SWE-bench Verified.
📄
arxiv.org/abs/2606.07412
2️⃣1️⃣ **SIA: Self Improving AI with Harness & Weight Updates**
💡 An AI system that can figure out how to improve itself through harness modifications and weight updates — removing humans as the bottleneck.
📄
arxiv.org/abs/2605.27276
2️⃣2️⃣ **SPACENUM: Revisiting Spatial Numerical Understanding in VLMs**
💡 Probes VLMs' ability to produce numerical outputs (action magnitudes, spatial coordinates) needed in embodied environments.
📄
arxiv.org/abs/2605.23898
2️⃣3️⃣ **Stream3D-VLM: Online 3D Spatial Understanding**
💡 From Tencent Hunyuan — first online 3D spatial understanding VLM with incremental geometry priors for streaming 3D scene understanding.
📄
arxiv.org/abs/2606.06891
2️⃣4️⃣ **Almieyar-Oryx-BloomBench: Bilingual Multimodal VLM Benchmark**
💡 A cognitively informed bilingual benchmark that rigorously diagnoses VLM reasoning abilities toward human-like multimodal understanding.
📄
arxiv.org/abs/2606.05531
2️⃣5️⃣ **HarnessForge: Joint Harness and Policy Evolution**
💡 Meta-adaptive framework that jointly evolves agent harness structure and policy for heterogeneous task regimes.
📄
arxiv.org/abs/2606.01779
2️⃣6️⃣ **Reinforcement Learning from Rich Feedback with Distributional DAgger**
💡 Replaces single binary rewards with distribution-level feedback signals for richer reasoning model training.
📄
arxiv.org/abs/2606.05152
2️⃣7️⃣ **When Gradients Collide: Multi-Objective Prompt Optimization Failure Modes**
💡 Analyzes gradient conflict failure modes when optimizing LLM judge prompts across multiple evaluation criteria simultaneously.
📄
arxiv.org/abs/2605.26046
2️⃣8️⃣ **Entropy as a Structural Prior for Musical Diversity**
💡 Log-barrier entropy prior on DiT belief space drives diversity in supervised diffusion training for music generation.
📄
arxiv.org/abs/2606.07207
2️⃣9️⃣ **A Cookbook of 3D Vision**
💡 Comprehensive survey covering 3D vision data representations, learning paradigms, and modeling strategies from Brown University.
📄
arxiv.org/abs/2606.04291
3️⃣0️⃣ **LayerRoute: Adaptive Layer Skipping for Agentic LMs**
💡 Input-conditioned adaptive layer skipping via LoRA — tool calls and reasoning steps get different computation depths.
📄
arxiv.org/abs/2606.01838
3️⃣1️⃣ **Towards Retrieving Interaction Spaces for Agentic Search (RISE)**
💡 Shifts retrieval from non-agentic IR to interactive corpus exploration for search agents.
📄
arxiv.org/abs/2606.06880
3️⃣2️⃣ **Streaming Video Generation with StreamForce**
💡 Streaming video generation framework enabling physically grounded control through continuous force inputs.
📄
arxiv.org/abs/2606.07508
3️⃣3️⃣ **ECI_sem: Semantic Residual Effective Contrastive Information**
💡 Pre-finetuning metric for evaluating hard negative quality in dense retrieval.
📄
arxiv.org/abs/2603.20990
3️⃣4️⃣ **Data-Efficient AR-to-Diffusion Language Models via On-Policy Distillation**
💡 Efficiently converts autoregressive LMs to diffusion LMs via on-policy distillation instead of training from scratch.
📄
arxiv.org/abs/2606.06712
3️⃣5️⃣ **Compress-Distill: Reasoning Trace Compression**
💡 Post-hoc compression of long chain-of-thought traces before knowledge distillation for efficient student training.
📄
arxiv.org/abs/2606.05988
3️⃣6️⃣ **Imaginative Perception Tokens for Spatial Reasoning**
💡 Introduces imaginative perception tokens that allow VLMs to "see" what they imagine — boosting spatial reasoning.
📄
arxiv.org/abs/2606.03988
3️⃣7️⃣ **Measuring Model Robustness via Fisher Information**
💡 Principled, attack-independent robustness evaluation via spectral bounds on Fisher information.
📄
arxiv.org/abs/2606.04767
3️⃣8️⃣ **Parametric Social Identity Injection in Public Opinion Simulation**
💡 From Tsinghua — injects parametric social identity into LLM-based public opinion simulation for diversity.
📄
arxiv.org/abs/2603.16142
3️⃣9️⃣ **Towards Human-Like Interactive Speech Recognition**
💡 From SJTU — agentic correction and semantic evaluation for interactive ASR systems.
📄
arxiv.org/abs/2605.29430
4️⃣0️⃣ **CORE: Contrastive Reflection for Rapid Reasoning Improvement**
💡 From Stanford — contrastive reflection enables rapid improvements in reasoning without expensive RL.
📄
arxiv.org/abs/2605.28742
4️⃣1️⃣ **Empirical Study on AI-usage in GitHub Repositories**
💡 Examines how developers actually use AI tools in code — looking at real code comments rather than evaluating LLM outputs in isolation.
📄
arxiv.org/abs/2606.06843
4️⃣2️⃣ **Chord-Symbol Time-Series Adaptation for Genre Identity**
💡 Explores capabilities and boundaries of chord-symbol modeling across multiple music genres.
📄
arxiv.org/abs/2606.07334
4️⃣3️⃣ **WorldBench: Visually Diverse Multimodal Reasoning Benchmark**
💡 Challenges VLMs with visual diversity rather than just task diversity for real-world reliability.
📄
arxiv.org/abs/2606.06538
4️⃣4️⃣ **Critic-R: Improving Agentic Search with Introspective Feedback**
💡 Natural language introspective feedback improves instruction-tuned retrievers for agentic search.
📄
arxiv.org/abs/2606.00590
4️⃣5️⃣ **The Distillation Game: Adaptive Attacks & Efficient Defenses**
💡 From Stanford — game-theoretic analysis of distillation attacks and defenses for model providers.
📄
arxiv.org/abs/2605.22737
4️⃣6️⃣ **Augmenting Attention with Exponentially Decaying Memory**
💡 Exponentially decaying memory improves query-aware KV sparsity methods (Quest, MoBA, SnapKV) consistently.
📄
arxiv.org/abs/2605.28640
━━━━━━━━━━━━━━━━━━━
📊 Trend Summary:
🤖 LLM Agent / Self-Evolution (8 papers) — The biggest theme. Agents that evolve, handle tool failures, manage long-term memory, and optimize their own harnesses.
📏 Benchmark / Evaluation (8 papers) — New benchmarks for genomics, audio editing, distributional reasoning, VLM cognition, and more.
🎥 Video / 3D Vision (6 papers) — World simulators, streaming 3D understanding, universal view synthesis.
🧠 Reasoning / Explainability (5 papers) — Causal graph explainability, imaginative perception tokens, contrastive reflection.
🗣️ TTS / Audio (3 papers) — dots.tts from Xiaohongshu achieving open-source SOTA.
🤖 Robotics / Motion (3 papers) — Data-centric motion tracking, questioning VLA scaling assumptions.
🎨 Diffusion / Generation (3 papers) — Physics-aware video generation, force-controlled streaming.
The field is clearly converging on agentic AI — not just better models, but smarter systems that can adapt, recover from failures, and evolve autonomously. 🔥
#AI #MachineLearning #HuggingFace #DailyPapers #LLM #Agents #CVPR #NIPS