🚀 Open-Source AI Models: May Recap
According to Zhihu contributor
@logcong0120, May may have felt quieter than previous months, but there was still plenty happening across China's open-source AI ecosystem.
Here's your lightning-fast recap👇
📅 May 1
•
@Alibaba_Qwen releases Qwen-Scope, using sparse constraints to extract more interpretable and disentangled latent features.
•
@MistralAI open-sources Mistral Medium 3.5 (128B).
📅 May 6
•
@Google releases Gemma 4 MTP Drafter, leveraging speculative decoding for up to 3× faster inference.
•
@ZyphraAI launches ZAYA1-8B, the first MoE model trained on AMD Instinct MI300 hardware.
📅 May 7 – OpenSearch-VL releases a VLM family (8B, 32B, 30B-A3B) alongside a full SFT RL training pipeline.
📅 May 8
• Qwen introduces WebWorld, a browser simulator for training Agents without relying on the real internet, together with WebWorldData trajectories.
• ModelBest releases SciCore-Mol, enhancing LLMs through pluggable external cognition modules.
• HiDream-O1-Image (8B) debuts, unifying image, text, and task conditions into a shared token space for image generation.
📅 May 11 –
@Xiaomi releases OneVL, an open autonomous-driving framework with model checkpoints.
📅 May 12 – ModelBest launches MiniCPM-V4.6, a 1.3B multimodal model optimized for mobile deployment, plus its Thinking version.
📅 May 13 –
@JinaAI_ introduces jina-embeddings-v5-omni, a multimodal embedding model supporting text, image, audio, and video.
📅 May 14 –
@AntLingAGI releases Ring-2.6-1T, enabling stable training of trillion-parameter models through asynchronous RL.
📅 May 15 – SenseTime opens SenseNova-U1-8B-MoT-Infographic, optimized for high-density infographic generation.
📅 May 16 –
@intern_lm unveils Intern-S2-Preview, a 35B scientific multimodal model surpassing its predecessor.
📅 May 18 – Bytedance releases Lance, a native multimodal model for understanding, generating, and editing images and videos.
📅 May 20 –
@Cohere launches Command A , a multimodal model with 218B total parameters and 25B active parameters.
📅 May 21
• Meituan's LongCat-Video-Avatar-1.5 improves digital-human video generation with more accurate lip-syncing.
•
@TencentHunyuan releases Hy-MT2, a multilingual translation model family spanning 1.8B to 30B-A3B.
• NetEase launches Confucius4, a multimodal model specialized for mathematical reasoning based on Qwen3.5.
📅 May 23 -24
ModelBest releases BitCPM-CANN, a family of native Ascend-trained 1.58-bit ternary models, and launches MiniCPM5-1B, targeting edge devices and local deployment.
📅 May 25 – Kuaishou releases Keye-VL-2.0, supporting near-lossless reasoning over 256K-token video contexts.
📅 May 26 – OpenMOSS upgrades MOSS-TTS-v1.5 with more stable voice cloning and pause control, and releases MOSS-SoundEffect-V2.0, generating environmental and action sounds directly from text.
📅 May 28 –
@Baidu_Inc updates PaddleOCR-VL-1.6, reaching 96.33% on OmniDocBench v1.6.
📅 May 29 –
@StepFun_ai releases Step3.7-Flash, a 198B multimodal model with configurable reasoning levels.
🙋 Also, open source wasn't the whole story this month. Several closed-source models were equally worth watching:
• Doubao-Seed-2.0-lite surprisingly outperformed Pro
• Qwen3.7-Max delivered a significant jump over 3.6
• GLM-5.1-HighSpeed pushed inference speed to 400 tok/s
• Google released Gemini 3.5 Flash and Gemini Omni
• Claude Opus 4.8 doubled down on honesty, reducing the chance of Agents "pretending" to finish long-horizon tasks
👀 For author, he's still waiting for Qwen3.7-32B and DeepSeek-VL. How about you?
📖 Full article:
zhuanlan.zhihu.com/p/2044378…
#OpenSource #LLM #Qwen #DeepSeek #MiniCPM #StepFun #Agent #AI