Introducing the compact, dense versions of Qwen3-VL — now available in 4B and 8B pairs, each with both Instruct and Thinking variants.
✅ Lower VRAM usage
✅ Full Qwen3-VL capabilities retained
✅ Strong performance across the board
Despite their size, they outperform models like Gemini 2.5 Flash Lite and GPT-5 Nano, and often beat them on benchmarks spanning STEM, VQA, OCR, video understanding, agent tasks, and more. In many cases, they even rival our flagship Qwen2.5-VL-72B from just six months ago!
Plus, FP8 versions are also available for efficient deployment.
Hugging Face:
huggingface.co/collections/Q…
ModelScope:
modelscope.cn/collections/Qw…
Qwen3-VL-8B-Instruct API:
modelstudio.console.alibabac…
Qwen3-VL-8B-Thinking API:
modelstudio.console.alibabac…
Cookbooks:
github.com/QwenLM/Qwen3-VL/b…