When to Think vs. When to Look
New
@CVPR 2026 paper available --- "Uncertainty-Guided Lookback for Vision–Language Models". A deep dive into reasoning in VLMs! with
@ChenliangXu and many collaborators
By analyzing token-level perplexity, we discovered a clear pattern: successful reasoning traces repeatedly "re-anchor" to the visual input, while failed ones drift into ungrounded textual speculation.
To address this, we’re introducing Uncertainty-Guided Lookback. It’s a training-free decoding strategy that:
🔥 Detects when a model’s reasoning chain is drifting into a visually uncertain regime.
🔥 Triggers short, adaptive "lookback" prompts to refocus the model on the image.
🔥 Improves accuracy by up to 6.5 points in specialist domains while reducing token usage by 35-45%.
It’s a reminder that in the rush toward massive compute and longer context, the most effective path forward is often the one that remains most grounded in reality.
Project Page:
proj-visual-thinking.jing.vi…
Paper (arXiv):
arxiv.org/pdf/2511.15613