Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens.
We study a related but more structural question: what ๐ธ๐ถ๐ป๐ฑ ๐ผ๐ณ ๐ฟ๐ฒ๐ฎ๐๐ผ๐ป๐ถ๐ป๐ด should we adapt?
Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a ๐๐ผ๐ฟ๐น๐ฑ ๐บ๐ผ๐ฑ๐ฒ๐น to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure.
In our new paper SRยฒAM (lower figure), we add a learned ๐ฐ๐ผ๐ป๐ณ๐ถ๐ด๐๐ฟ๐ฎ๐๐ผ๐ฟ (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely.
Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.