🎉 Congrats to
@nvidia on the release of Nemotron 3 Super — day-0 support in vLLM v0.17.1! Verified on NVIDIA GPUs.
120B hybrid MoE, only 12B active at inference. Big upgrades over the previous Nemotron Super:
- 5x higher throughput
- 2x higher accuracy on Artificial Analysis Intelligence Index
- Multi-Token Prediction (MTP) for faster long-form generation
- Configurable thinking budget — dial accuracy vs token cost per task
- 1M token context window
Supports BF16, FP8, and NVFP4. Fully open: weights, datasets, recipes.
Blog:
vllm.ai/blog/nemotron-3-supe…
🤝 Thanks
@NVIDIAAIDev Nemotron team and vLLM community contributors!
Introducing NVIDIA Nemotron 3 Super 🎉
Open 120B-parameter (12B active) hybrid Mamba-Transformer MoE model
Native 1M-token context
Built for compute-efficient, high-accuracy multi-agent applications
Plus, fully open weights, datasets and recipes for easy customization and deployment. 🧵