Hugging Face Daily Papers (2026-06-15) — 46 papers
A dense day for agents, multimodal reasoning, robotics, reliability, and post-training. Full list:
1. OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data
Highlight: Improves video understanding or generation by adding stronger temporal, retrieval, or control signals.
arXiv:
arxiv.org/abs/2606.13432
2. APPO: Agentic Procedural Policy Optimization
Highlight: Advances agent design, orchestration, memory, tooling, or evaluation for more capable autonomous systems.
arXiv:
arxiv.org/abs/2606.12384
3. Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents
Highlight: Advances agent design, orchestration, memory, tooling, or evaluation for more capable autonomous systems.
arXiv:
arxiv.org/abs/2606.06036
4. From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI
Highlight: Adds a focused method or analysis for improving model capability, reliability, or efficiency.
arXiv:
arxiv.org/abs/2606.14502
5. Orchestra-o1: Omnimodal Agent Orchestration
Highlight: Advances agent design, orchestration, memory, tooling, or evaluation for more capable autonomous systems.
arXiv:
arxiv.org/abs/2606.13707
6. HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry
Highlight: Advances agent design, orchestration, memory, tooling, or evaluation for more capable autonomous systems.
arXiv:
arxiv.org/abs/2606.14249
7. Rethinking RAG in Long Videos: What to Retrieve and How to Use It?
Highlight: Improves video understanding or generation by adding stronger temporal, retrieval, or control signals.
arXiv:
arxiv.org/abs/2606.13141
8. OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains
Highlight: Builds a structured dataset to push multimodal reasoning beyond shallow caption-based supervision.
arXiv:
arxiv.org/abs/2606.14702
9. From AGI to ASI
Highlight: Adds a focused method or analysis for improving model capability, reliability, or efficiency.
arXiv:
arxiv.org/abs/2606.12683
10. Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO
Highlight: Refines post-training, optimization, compression, or policy-learning mechanics.
arXiv:
arxiv.org/abs/2605.30789
11. Measuring Epistemic Resilience of LLMs Under Misleading Medical Context
Highlight: Focuses on reliability, hallucination detection, or safety under high-stakes model use.
arXiv:
arxiv.org/abs/2606.12291
12. RedAct: Redacting Agent Capability Traces for Procedural Skill Protection
Highlight: Advances agent design, orchestration, memory, tooling, or evaluation for more capable autonomous systems.
arXiv:
arxiv.org/abs/2606.10813
13. Skip a Layer or Loop It? Learning Program-of-Layers in LLMs
Highlight: Adds a focused method or analysis for improving model capability, reliability, or efficiency.
arXiv:
arxiv.org/abs/2606.06574
14. RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space
Highlight: Strengthens multimodal representation or generation with a more unified modeling pipeline.
arXiv:
arxiv.org/abs/2606.14700
15. LLM Agents Can See Code Repositories
Highlight: Advances agent design, orchestration, memory, tooling, or evaluation for more capable autonomous systems.
arXiv:
arxiv.org/abs/2606.14061
16. iMaC: Translating Actions into Motion and Contact Images for Embodied World Models
Highlight: Connects perception, action, and world modeling for more robust embodied decision-making.
arXiv:
arxiv.org/abs/2606.09813
17. Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack
Highlight: Connects perception, action, and world modeling for more robust embodied decision-making.
arXiv:
arxiv.org/abs/2606.14409
18. No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions
Highlight: Adds a focused method or analysis for improving model capability, reliability, or efficiency.
arXiv:
arxiv.org/abs/2606.13044
19. Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation
Highlight: Targets stronger reasoning and verification through new training, evaluation, or formalization methods.
arXiv:
arxiv.org/abs/2606.12594
20. VISTA: View-Consistent Self-Verified Training for GUI Grounding
Highlight: Adds a focused method or analysis for improving model capability, reliability, or efficiency.
arXiv:
arxiv.org/abs/2606.14579
21. $μ_0$: A Scalable 3D Interaction-Trace World Model
Highlight: Connects perception, action, and world modeling for more robust embodied decision-making.
arXiv:
arxiv.org/abs/2606.13769
22. The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment
Highlight: Advances agent design, orchestration, memory, tooling, or evaluation for more capable autonomous systems.
arXiv:
arxiv.org/abs/2606.10747
23. MBench: A Comprehensive Benchmark on Memory Capability for Video World Models
Highlight: Introduces a benchmark/evaluation setting that makes the target capability easier to measure.
arXiv:
arxiv.org/abs/2606.00793
24. Avatar V: Scaling Video-Reference Avatar Video Generation
Highlight: Improves video understanding or generation by adding stronger temporal, retrieval, or control signals.
arXiv:
arxiv.org/abs/2606.13872
25. The Hidden Power of Scaling Factor in LoRA Optimization
Highlight: Adds a focused method or analysis for improving model capability, reliability, or efficiency.
arXiv:
arxiv.org/abs/2606.12883
26. RhymeFlow: Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling
Highlight: Improves video understanding or generation by adding stronger temporal, retrieval, or control signals.
arXiv:
arxiv.org/abs/2606.06309
27. ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning
Highlight: Introduces a benchmark/evaluation setting that makes the target capability easier to measure.
arXiv:
arxiv.org/abs/2606.14697
28. When is Your LLM Steerable?
Highlight: Adds a focused method or analysis for improving model capability, reliability, or efficiency.
arXiv:
arxiv.org/abs/2606.11599
29. LoSoNA: A Benchmark for Local Social Norm Adaptation in Group Conversations
Highlight: Introduces a benchmark/evaluation setting that makes the target capability easier to measure.
arXiv:
arxiv.org/abs/2606.14600
30. AFFORDANCE20Q: Evaluating Affordance Reasoning from Physical Properties
Highlight: Targets stronger reasoning and verification through new training, evaluation, or formalization methods.
arXiv:
arxiv.org/abs/2606.14240
31. Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation
Highlight: Refines post-training, optimization, compression, or policy-learning mechanics.
arXiv:
arxiv.org/abs/2606.13657
32. Benchmarking AI Agents for Addressing Scientific Challenges Across Scales
Highlight: Introduces a benchmark/evaluation setting that makes the target capability easier to measure.
arXiv:
arxiv.org/abs/2606.12736
33. P3D-Bench: Benchmarking MLLMs for Parametric 3D Generation and Structural Reasoning
Highlight: Introduces a benchmark/evaluation setting that makes the target capability easier to measure.
arXiv:
arxiv.org/abs/2606.11152
34. An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models
Highlight: Targets stronger reasoning and verification through new training, evaluation, or formalization methods.
arXiv:
arxiv.org/abs/2606.01462
35. World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible
Highlight: Adds a focused method or analysis for improving model capability, reliability, or efficiency.
arXiv:
arxiv.org/abs/2606.13652
36. APT: Action Expert Pretraining Improves Instruction Generalization of Vision-Language-Action Policies
Highlight: Connects perception, action, and world modeling for more robust embodied decision-making.
arXiv:
arxiv.org/abs/2606.12366
37. AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models
Highlight: Advances agent design, orchestration, memory, tooling, or evaluation for more capable autonomous systems.
arXiv:
arxiv.org/abs/2606.08952
38. Two-Fidelity Best-Action Identification for Stochastic Minimax Tree
Highlight: Connects perception, action, and world modeling for more robust embodied decision-making.
arXiv:
arxiv.org/abs/2606.01708
39. Squeeze-Release: Iterative Pruning with Exact Structural Minimization
Highlight: Refines post-training, optimization, compression, or policy-learning mechanics.
arXiv:
arxiv.org/abs/2606.14346
40. AdaSR: Adaptive Streaming Reasoning with Hierarchical Relative Policy Optimization
Highlight: Targets stronger reasoning and verification through new training, evaluation, or formalization methods.
arXiv:
arxiv.org/abs/2606.14694
41. Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics
Highlight: Focuses on reliability, hallucination detection, or safety under high-stakes model use.
arXiv:
arxiv.org/abs/2606.12476
42. WaveDiT: Distribution-Aware Wavelet Flow Matching for Efficient 3D Brain MRI Synthesis
Highlight: Explores generative or structural modeling in 3D, geometry, or medical imaging domains.
arXiv:
arxiv.org/abs/2606.08670
43. ActiveMimic: Egocentric Video Pretraining with Active Perception
Highlight: Improves video understanding or generation by adding stronger temporal, retrieval, or control signals.
arXiv:
arxiv.org/abs/2606.06194
44. Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference
Highlight: Adds a focused method or analysis for improving model capability, reliability, or efficiency.
arXiv:
arxiv.org/abs/2606.05308
45. FVSpec: Real-World Property-Based Tests as Lean Challenges
Highlight: Targets stronger reasoning and verification through new training, evaluation, or formalization methods.
arXiv:
arxiv.org/abs/2606.01008
46. CARVE: Certified Affordable Repair of Vetoed Maneuvers via Envelopes for Interactive Driving
Highlight: Connects perception, action, and world modeling for more robust embodied decision-making.
arXiv:
arxiv.org/abs/2606.02641
Trend summary:
- Multimodal / vision / generation: 15
- Agents & autonomy: 10
- Reasoning, evaluation & reliability: 10
- Other AI/ML systems: 4
- Robotics & embodied AI: 4
- Training, RL & optimization: 3
#AI #MachineLearning #LLM #HuggingFace #arXiv