We trained a Visual LLM to reason using GRPO, and open sourced the code. Tiny 3B model beats all the big players (GPT, Claude, etc 0-shot) after RL training on this cryptogram task. Live demo and links:
groundlight.ai/blog/visual-r…
Vision foundation models have kinda stagnated recently, but now that we have shown how to incorporate reason, I think we'll be able to make progress again.