Today we’re introducing Gemma 4 12B — our latest open model that brings advanced agentic reasoning, vision and audio directly to your laptop.
It delivers performance nearing our larger Gemma models with a much smaller total memory footprint, while being small enough to run locally with just 16GB of VRAM. It’s open and accessible for everyone to use under a permissive Apache 2.0 license.
This is all made possible by our new, unified architecture that removes separate multimodal encoders. Here’s how we did it 🧵
ALT Promotional graphic on a black background featuring the large blue text "Gemma 4 12B" above smaller white text that reads "Unified Transformer." A glowing blue ribbon containing multi-modal icons (representing images, text, and audio) flows from the left into a central point, branching out into a complex, luminous blue neural network map on the right.