New Multimedia papers from arxiv.org: multimedia. Thank you to arXiv for use of its open access interoperability.

Joined March 2010
152 Photos and videos
LangRetrieval: Language-Guided Self-Evolving Satellite-to-Radar Retrieval via CSI-Driven Reward Chunlei Shi, Junming Hou, Yi-Lin Wei, Jiong Wang, Yecheng Zhang, Yichao Dong, Wenqi Ren, … arxiv.org/abs/2606.09486 [𝚌𝚜.𝙼𝙼] πŸ’¬Submitted to IEEE Transactions on Image Processing
99
Conan-embedding-v3: Fusing Modality-Specific Models for Omni-Modal Embedding Shiyu Li, Zhiyuan Hu, Yifan Wang, Peiming Li, Zheng Wei, Yang Tang arxiv.org/abs/2606.09331 [𝚌𝚜.𝙼𝙼 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙻𝙢]
26
LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models Rui Wang, Yan Zhao, Li Song, Zhengxue Cheng arxiv.org/abs/2606.05861 [𝚌𝚜.𝙼𝙼 𝚌𝚜.𝙰𝙸] πŸ’¬Submitted to IEEE BMSB 2026
10
UNIVID: Unified Vision-Language Model for Video Moderation Kejuan Yang, Yizhuo Zhang, Mingyuan Du, Yue Zhang, Dixin Zheng, Kaili Zhao, Yang Xiao, Hanzhong Liang, Kenan Xiao arxiv.org/abs/2606.05748 [𝚌𝚜.𝙼𝙼 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙲𝙻] πŸ’¬Accepted to ACL 2026 Industry Track
20
Beyond Generative Decoding: Discriminative Hidden-State Readout from a Native Omni-Modal LLM for Multimodal Sentiment Analysis Bin Wen, Tien-Ping Tan arxiv.org/abs/2606.05713 [𝚌𝚜.𝙼𝙼 𝚌𝚜.πš‚π™³ 𝚎𝚎𝚜𝚜.π™°πš‚]
6
GS-NFS: Bandwidth-adaptive Streaming of Dynamic Gaussian Splats and Point Clouds Rajrup Ghosh, Haodong Wang, Haoran Hong, Eduardo Pavez, Amartya Chaudhuri, Weiwu Pang, Harsha V. Madhyastha, Antonio Ortega, … arxiv.org/abs/2606.05650 [𝚌𝚜.𝙼𝙼 𝚌𝚜.π™²πš… 𝚌𝚜.π™Άπš 𝚌𝚜.𝙽𝙸]
9
Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation Yuxuan Bian, Zeyue Xue, Songchun Zhang, Shiyi Zhang, Weiyang Jin, Yaowei Li, Junhao Zhuang, Haoran Li, Jie Huang, Haoyang Huang, Nan Duan, … arxiv.org/abs/2606.04527 [𝚌𝚜.𝙼𝙼 𝚌𝚜.π™²πš… 𝚌𝚜.π™Άπš]
25
DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities Sajad Ebrahimi, Nima Jamali, Bardia Shirsalimian, Kelly McConvey, Wentao Zhang, … arxiv.org/abs/2606.04205 [𝚌𝚜.𝙼𝙼 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙲𝙻 𝚌𝚜.π™²πš… 𝚌𝚜.𝙻𝙢 𝚌𝚜.πš‚π™³]
1
1
27
OmniHalluc-L: Counterfactual Benchmarking and Modality-Perturbation Reliability Calibration for Long-Form Omni Hallucination Zixuan Dong, Jiafu Tang, Zhide Lei, Zhe Cao, Zijie Zhang, Yanghai Wang, Shihao Li, Xiaodong Wang, Baoyun Peng, … arxiv.org/abs/2606.03614 [𝚌𝚜.𝙼𝙼]
50
Inference-Time Scaling for Joint Audio-Video Generation Jaemin Jung, Kyeongha Rho, Inkyu Shin, Joon Son Chung arxiv.org/abs/2606.03183 [𝚌𝚜.𝙼𝙼 𝚌𝚜.π™²πš… 𝚌𝚜.πš‚π™³ 𝚎𝚎𝚜𝚜.π™°πš‚] πŸ’¬Accepted by Transactions on Machine Learning Research (TMLR)
36
TimeLogic Challenge @ CVPR 2026: Strong MLLMs Meet Evidence-Seeking Agents for Temporal-Logic Video Question Answering Zhaoyang Xu, Xusheng He, Wei Liu, Zhenyang Li, Jianlong Wu arxiv.org/abs/2606.01631 [𝚌𝚜.𝙼𝙼]
23
A Pilot Study on Curator-Guided Multilingual Art Description for Blind and Low-Vision Audiences with Small Vision-Language Models Iosif Tsangko, Andreas Triantafyllopoulos, George Margetis, … arxiv.org/abs/2605.31080 [𝚌𝚜.𝙼𝙼 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙲𝙻 𝚌𝚜.π™²πš… 𝚌𝚜.𝙷𝙲]
11
Dynamic Interaction-Aware and Causality-Disentangled Framework for Multimodal Sentiment Analysis Guangyuan Dong, Ziwei Hong, Shenghao Liu, Chenyu Wu, Yuanyuan Fang, Zihao Li, Xudong Zhang, Bingchen Liu, Yuchen Zhang, Haitao Ding, … arxiv.org/abs/2605.30994 [𝚌𝚜.𝙼𝙼]
69
Unveiling the Visual Counting Bottleneck in Vision-Language Models Xingzhou Pang, Yifan Hou, Junling Wang, Mrinmaya Sachan arxiv.org/abs/2605.30170 [𝚌𝚜.𝙼𝙼 𝚌𝚜.π™²πš… 𝚌𝚜.𝙻𝙢] πŸ’¬ICML 2026
12
State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition Zhaoyan Pan, Xiangdong Li, Wenke Wu, Mengting Ma, Ye Lou, Ji Zhou, Jiatong Pan, Wei Zhang arxiv.org/abs/2605.29590 [𝚌𝚜.𝙼𝙼]
66
Can We Hear from Events? Generating Speech from Event Camera Jingping Fang, Lin Chen, Chenyang Xu, Tong Zhao, Weidong Cai, Xiaoming Chen arxiv.org/abs/2605.26672 [𝚌𝚜.𝙼𝙼 𝚌𝚜.πš‚π™³]
34
Reproducibility Companion Paper: Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks Hamed Alimohammadzadeh, Shahram Ghandeharizadeh, Federico Cunico, Joshua Springer arxiv.org/abs/2605.26313 [𝚌𝚜.𝙼𝙼]
13