Praveen Kumar Verma

Praveen Kumar Verma

Users
Tweets

Praveen Kumar Verma

@Alacritic_Super

🚀 Introducing Robust-U1: Teaching MLLMs to Self-Recover Corrupted Visual Content Multimodal Large Language Models (MLLMs) have achieved impressive visual understanding, yet they remain highly brittle under real-world corruptions—noise, blur, compression artifacts, adverse weather. Standard MLLMs suffer dramatic performance drops, and existing robustness solutions come with fundamental limits: black‑box feature alignment lacks interpretability, while white‑box text reasoning cannot restore the lost pixel‑level visual details. This raises a crucial question: 🧐 Can MLLMs recover corrupted visual content by themselves? If the answer is yes, we can move beyond merely compensating for corruption and instead build a more intrinsic, generalizable form of resilience. Robust-U1 is our answer to that question. 💡 Paper: arxiv.org/abs/2606.08063 🔗 Code: github.com/jqtangust/Robust-…

mambe

mambe @mambe_h

Replying to @Dagovj @NicHerBar @fe40630

Es que efectivamente lo que digo es generalizable. Basta con saber cuales son los requisitos para ser paco, ahí te das cuenta que los que tienen el uso legitimo de la fuerza no son más que animales sin criterio.

(NEW ACC) • Magnolia 🌸

(NEW ACC) • Magnolia 🌸@Sceneric0

Replying to @The_Ghero69

Tbf this is my own experience and that's obv not going to generalizable but maybe social media?

Lincoln Margison - Game Development

Lincoln Margison - Game Development

@LincolnMargison

Replying to @morphysw @UnrealEngine

motionrig is planned as a plugin for exactly that, one node in the animgraph (or optionally controlrig if you want to control more specifics) which handles it all, with a bunch of profile/settings for configuring it for different styles/characters. My current limitation on it is I don't really know much about doing the UI implementation side. I want something kinda like the modular rigging system, except for motion. So you'd do setups like connect [bone->bone] distance to drive [other bone rotation], chaining things together like that in some visual way where you can see it simulating. Adding as much depth/control as you want. Then it all just gets plugged in as a data profile to one node. And also still working through some generalizable flexibility for proc locomotion. It's quite easy to make it work for one character, but to generalise it where it works for *any* character is harder. I don't want people to have to be doing stuff like setting specific primary/secondary axis things or defining things like foot lengths etc. Just want it all to work instantly but then be customizable for stylistic reasons if you choose to.

Wendy

Wendy

@LianwenJ

10h

Hugging Face Daily Papers — 2026-06-13 44 papers today. Full list with arXiv links: 1. EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments Highlight: Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. arXiv: arxiv.org/abs/2606.13681 2. MiniMax Sparse Attention Highlight: Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memor. arXiv: arxiv.org/abs/2606.13392 3. WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Highlight: Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line execution, code editing, browsers, an. arXiv: arxiv.org/abs/2606.09426 4. SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning Highlight: Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision. arXiv: arxiv.org/abs/2606.13673 5. InterleaveThinker: Reinforcing Agentic Interleaved Generation Highlight: Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. Ho. arXiv: arxiv.org/abs/2606.13679 6. MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling Highlight: We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first tra. arXiv: arxiv.org/abs/2606.13473 7. Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding? Highlight: Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in visual understanding, yet their performance degrades significantly. arXiv: arxiv.org/abs/2606.08063 8. FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents Highlight: Training deep search agents requires verifiable questions whose answers remain unavailable until sufficient evidence has been acquired through sear. arXiv: arxiv.org/abs/2606.12087 9. LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories Highlight: Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside. arXiv: arxiv.org/abs/2606.13578 10. HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers Highlight: Holistic visual tokenizers are fundamental to unified multimodal models (UMMs) as they map diverse visual inputs into a unified representation spac. arXiv: arxiv.org/abs/2606.13289 11. N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization Highlight: The success of Large Language Models in mathematical reasoning relies heavily on the generation of diverse and valid solution paths during the roll. arXiv: arxiv.org/abs/2606.10768 12. EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery Highlight: LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they. arXiv: arxiv.org/abs/2606.13662 13. Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning Highlight: Latent chain-of-thought compresses reasoning by replacing visible reasoning traces with continuous hidden-state recurrence, but existing formulatio. arXiv: arxiv.org/abs/2606.13106 14. VideoMDM: Towards 3D Human Motion Generation From 2D Supervision Highlight: We introduce VideoMDM, a diffusion-based framework that trains 3D human motion priors directly from accurate 2D poses extracted from monocular vide. arXiv: arxiv.org/abs/2606.13364 15. VIA-SD: Verification via Intra-Model Routing for Speculative Decoding Highlight: Speculative decoding (SD) addresses the high inference costs of LLMs by having lightweight drafters generate candidates for large verifiers to vali. arXiv: arxiv.org/abs/2606.12243 16. Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback Highlight: Despite generating increasingly photorealistic images, text-to-image (T2I) models still exhibit localized, subtle, and structurally complex failure. arXiv: arxiv.org/abs/2606.06113 17. From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion Highlight: Multimodal image fusion aims to integrate complementary information from different modalities into a fused image that preserves rich local details. arXiv: arxiv.org/abs/2606.12303 18. MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold Highlight: We present MoVerse, a real-time video world model that creates an interactively navigable scene from a single narrow-field-of-view image. This sett. arXiv: arxiv.org/abs/2606.13376 19. TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search Highlight: Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central chal. arXiv: arxiv.org/abs/2606.11662 20. HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness Highlight: Large language models are increasingly deployed as agents for long-horizon tasks, yet their performance is shaped not only by model capability and. arXiv: arxiv.org/abs/2606.12882 21. Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models Highlight: Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly. arXiv: arxiv.org/abs/2606.11409 22. High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation Highlight: Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this. arXiv: arxiv.org/abs/2606.12575 23. Visual Para-Thinker : A Single-Policy Multi-Agent Framework for Visual Reasoning Highlight: Visual reasoning requires integrating evidence distributed across regions, attributes, and relations, making single-chain reasoning prone to early. arXiv: arxiv.org/abs/2606.09290 24. SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling Highlight: On-policy distillation (OPD) trains a student on its own trajectories with dense per-token supervision from a stronger teacher, and often outperfor. arXiv: arxiv.org/abs/2606.09304 25. Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior Highlight: Anticipating LLM behavioral tendencies from low-cost psychometric probes is critical for safe deployment, but only if self-reports (SR) reliably pr. arXiv: arxiv.org/abs/2606.12730 26. EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge Highlight: Search Agents -- large language models augmented with search tools -- have intensified the need for future-proof evaluation benchmarks. Existing be. arXiv: arxiv.org/abs/2606.13120 27. MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training Highlight: Representation alignment with pretrained vision models has recently shown strong potential for accelerating diffusion transformer training. By alig. arXiv: arxiv.org/abs/2606.08788 28. See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents Highlight: Multi-agent systems communicate mostly through text, paying a lossy and expensive decode and re-encode cost. KV-cache communication is a promising. arXiv: arxiv.org/abs/2606.13594 29. Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents Highlight: Compact language models (LMs) reduce cost, latency, and deployment risk for tool agents. Yet MCP-style tool use requires more than isolated functio. arXiv: arxiv.org/abs/2606.12674 30. MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning Highlight: Robotic simulators are a cornerstone of modern research in aerial robotics, serving both as a vehicle for the development of new control algorithms. arXiv: arxiv.org/abs/2606.08039 31. Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents Highlight: Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in o. arXiv: arxiv.org/abs/2606.13174 32. ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages Highlight: Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in s. arXiv: arxiv.org/abs/2606.13572 33. $\texttt{WEAVER}$, Better, Faster, Longer: An Effective World Model for Robotic Manipulation Highlight: The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and te. arXiv: arxiv.org/abs/2606.13672 34. Surflo: Consistent 3D Surface Flow Model with Global State Highlight: Geometry is invariant to viewpoint, which makes any collection of images a redundant encoding of a single 3D state. Existing feed-forward reconstru. arXiv: arxiv.org/abs/2606.13644 35. WebChallenger: A Reliable and Efficient Generalist Web Agent Highlight: Autonomous web navigation remains challenging for LLM agents, and the strongest generalist systems rely on proprietary reasoning models whose infer. arXiv: arxiv.org/abs/2606.10423 36. Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering Highlight: We present \textbf{Flash-GMM}, a fused Triton kernel for efficient computation of Gaussian Mixture Models (GMMs) over large-scale data in a single. arXiv: arxiv.org/abs/2606.10896 37. IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder Highlight: Built on pretrained vision foundation models (VFMs), representation autoencoders (RAEs) have recently emerged as a promising approach for construct. arXiv: arxiv.org/abs/2606.11096 38. Revisiting Articulated Parts Perception in Robot Manipulation Highlight: We are surrounded by various objects with movable, articulated parts, e.g., box, handle, door. An accurate and generalizable perception of articula. arXiv: arxiv.org/abs/2606.08103 39. The Cold-Start Safety Gap in LLM Agents Highlight: Are tool-calling LLM agents equally safe throughout a conversation? We discover they are not: agents are most vulnerable at the very start of a ses. arXiv: arxiv.org/abs/2606.07867 40. ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs Highlight: Large language models deployed as agents over large tool catalogs face a critical tool-retrieval bottleneck. As embedding-based retrieval approache. arXiv: arxiv.org/abs/2606.12451 41. A Stationary (and Therefore Compatible) Representation is All You Need Highlight: Learning compatible representations aims to learn feature representations that can be used interchangeably over time whenever a model undergoes upd. arXiv: arxiv.org/abs/2606.12488 42. PianoKontext: Expressive Performance Rendering from Deadpan Context Highlight: Expressive performance rendering (EPR) aims to generate realistic performances constrained on sequences of notes. However, flow matching audio edit. arXiv: arxiv.org/abs/2606.12282 43. Leveraging Morphology for Historical Script Metrological Analysis Highlight: Advances in handwritten text recognition have enabled large-scale transcription of historical documents, but still provide limited access to interp. arXiv: arxiv.org/abs/2606.09446 44. On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance Highlight: Large Language Models (LLMs) are increasingly used for zero-shot annotation and LLM-as-a-judge tasks, yet their reliability hinges on how model-int. arXiv: arxiv.org/abs/2606.00467 Trend summary: - Agents / Computer-use / Spatial reasoning: 18 - Multimodal / Vision / Video: 10 - Reasoning / Math / RL: 7 - Other ML methods: 5 - LLMs / Efficient modeling: 3 - Audio / Speech: 1 Papers with code links found: 32/44

EvoArena: Tracking Memory Evolution for Robust LLM Agents in...

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently...

arxiv.org

158

💮🦋 Chula 🦋 💮

💮🦋 Chula 🦋 💮@Julichuli2025

11h

Replying to @emilioaherrera @naldezjunior

No es generalizable, depende de la persona, no del sexo

Kansanparantaja

Kansanparantaja @Kansanparantaja

14h

Replying to @Kansanparantaja @Tellit007

if it is some clean generalizable effect size. And no, FOURIER and ODYSSEY did not “replicate Cohen 2006.” They were short-term drug-regimen trials in selected high risk patients. Cohen 2006 was a rare lifelong PCSK9 loss-of-function natural experiment. Same broad direction does

Alex Liaptchev

Alex Liaptchev @liaptchev

16h

This rule is actually pretty generalizable.

Mindy MF Robinson 🦄

@iheartmindy

16h

Replying to @WeldrSkeltr

So look for tight buttholes on watermelons?

David ¿El Carroza?

David ¿El Carroza?@Elcarroza

17h

Replying to @Almogaver688

No es generalizable, jo tinc un entorn social majoritàriament peruà i puc certificar que tret d'algunes notables excepcions la gran majoria respectivament la cultura i l'idioma català, i amb això no vull dir que siguin la majoria, només comento el que jo estic vivint en el entorn

798

hombre de poca fe

hombre de poca fe @timescavenger

17h

Replying to @pitiklinov

Esta tesis, ¿no es en buena medida generalizable a cualquier institución colectiva y a la sociedad en su conjunto?

266

Hanlon's Laser

Hanlon's Laser @aphofer

18h

Replying to @SethDillon @ConceptualJames

Most measured wealth inequality is multiple expansion, so your experience is generalizable.

Hanlon's Laser @aphofer

19h

This is just an AI estimate, but I think this factor is underappreciated by nearly everyone who talks about wealth inequality. Higher P/E multiples probably explain more than half of the growth in wealth inequality.

xiduoyu

xiduoyu

@Joeyyyyuwww

19h

x.com/i/article/206579311286…

143

Annex Baja

Annex Baja

@annexbaja

19h

Replying to @wbsummer

Hopefully there is a generalizable reaction away from "bad things are underrated, obvious solutions are irreparably flawed" attitudes in favor of the "fix everything easily switch". It is a powerful idea, a Genealogy of Morals meme. We need to stop the problem, not subsidize it.

Manel Moles

Manel Moles @manel_moles

20h

Replying to @lauritasvoyhey

Dices que he generalizado... Si lo hubiese hecho, pues lo habría hecho. No hay problema. Pero no es el caso. La única 'generalización' es una obviedad: que todos los profes tenemos un pasado y, en algún momento, cursamos secundaria. Y ahí, cada uno tiene su historia. Y es lo que pregunto en mi post. ¿Cuál es tu historia? Yo he explicado la mía. Y mi historia no es generalizable. Es particular. Y si me respondes, pues es posible que te conteste. Y aquí, lo que afirmas me parece una absurdidad que poco tiene que ver con el tema del post. Y si no quieres que te conteste, pues dale al botoncito de bloquear y fuera.

goodalexander

goodalexander

@goodalexander

21h

Replying to @pumatheuma

more elaborate proof of reserves system (includes 'proof of leverage', and 'proof of counterparty risk') that is generalizable and easy to use with an agent

142

Andrew Zywiec, M.D.

Andrew Zywiec, M.D.

@AndrewZywiecMD

22h

Replying to @misfitpatriot_

I don't know why these conversations are so difficult for everyone. You are naming individual dictators and serial killers, literal 1 in a million cases, and comparing it to the everyday random violence and generalizable concerns of a subset of American culture. It simply doesn't work. No intelligent person is ever stating "all" this people or "all" that people. They are stating objective problems within defined parameters that clearly exist and have logical explanations for. I fine example is Sowell, or Kirk. Their explanations for issues and concerns in modern black culture are well founded and if addressed, rather than ignored, obfuscated, and denied, would greatly benefit society, and the black population specifically. Virtue signal less, and objectively analyze more. It's getting tiring listening to people like you make things worse. Holding certain people to lesser standards and no accountability generates exactly that.

358

Robotics Papers

Robotics Papers @OWW

Jun 13

VICX: Generalizable Robot Manipulation via Video Generation and In-Context Operator Network Song Chen, Linyan Xiang, Ying Zhou, Liu Yang arxiv.org/abs/2606.12028 [𝚌𝚜.𝚁𝙾]

Generalizable robot manipulation requires not only task-level reasoning over unseen scenes, but also reliable grounding of visual plans into embodiment-specific execution. To bridge this gap, we propose VICX (Video generation and In-Context eXecution), a decoupled closed-loop manipulation framework. In VICX, a frozen video generation model produces vision-language-conditioned high-level visual plans, while a Video-to-Trajectory In-Context Operator Network (V2T-ICON) serves as the task-agnostic interface that grounds these plans into executable robot-state trajectories. To improve execution generalization, V2T-ICON operates on segmentation-extracted arm-only frame observations and uses retrieved image-state pairs as in-context prompts, allowing a robust and generalizable visual-to-state mapping at inference time without parameter updates. Experiments on Meta-World show that VICX supports cross-task generalization, closed-loop self-correction, and cross-embodiment transfer, demonstrating

ALT Generalizable robot manipulation requires not only task-level reasoning over unseen scenes, but also reliable grounding of visual plans into embodiment-specific execution. To bridge this gap, we propose VICX (Video generation and In-Context eXecution), a decoupled closed-loop manipulation framework. In VICX, a frozen video generation model produces vision-language-conditioned high-level visual plans, while a Video-to-Trajectory In-Context Operator Network (V2T-ICON) serves as the task-agnostic interface that grounds these plans into executable robot-state trajectories. To improve execution generalization, V2T-ICON operates on segmentation-extracted arm-only frame observations and uses retrieved image-state pairs as in-context prompts, allowing a robust and generalizable visual-to-state mapping at inference time without parameter updates. Experiments on Meta-World show that VICX supports cross-task generalization, closed-loop self-correction, and cross-embodiment transfer, demonstrating

105

Anagha Agile Systems

Anagha Agile Systems

@aasaitech

Jun 13

x.com/i/article/206429272792…

Zhiwen Yang

Zhiwen Yang @Yaziwel

Jun 13

U-TTT: Towards Generalizable PET Image Denoising via Test-Time Training Arxiv: arxiv.org/pdf/2606.11032 Github: github.com/Yaziwel/U-TTT

Nimayi Dixit

Nimayi Dixit

@nimayi3

Jun 13

A useful (and perhaps complete) philosophy for life will give you generalizable principles for the following domains: 1) What to do when life is giving you open space 2) What to do when life is pressing on you with immediate demands 3) What not to do 4) What to do with circumstances unfolding outside of our control And it will give you principles for the type of attitude to bring to each of these domains as well.