Congrats to the
@googlegemma team on the Gemma 4 12B launch 🎉 Day-0 support on vLLM is ready to go.
It's an encoder-free unified multimodal model — text, image, audio, and video all project straight into the LLM's embedding space, no separate vision or audio towers. 256K context, built-in thinking, native tool calling.
Reasoning tool parsers (`gemma4`), vision, and audio all served through the OpenAI-compatible API.
🔗 Recipe:
recipes.vllm.ai/Google/gemma…
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇