From the Hierarchical Reasoning Model (HRM) to a new Tiny Recursive Model (TRM).
A few months ago, the HRM made big waves in the AI research community as it showed really good performance on the ARC challenge despite its small 27M size. (That's about 22x smaller than the smallest Qwen3 0.6B model.)
Now, the new "Less is More: Recursive Reasoning with Tiny Networks" paper proposes Tiny Recursive Model (TRM), which a simpler and even smaller model (7M, 4ร smaller than HRM) that performs even better on the ARC challenge.
๐น What does recursion mean here?
TRM refines its answer in two steps:
1. It updates a latent (reasoning) state from the current question and answer.
2. Then it updates the answer based on that latent state.
Training runs for up to 16 refinement steps per batch. Each step does several no-grad loops to improve the answer, followed by one gradient loop that learns from the full reasoning process.
By the way, the question and the answer are grids of discrete tokens, not text. (E.g., 9ร9 Sudoku and up to 30ร30 ARC and Maze.)
๐น And how does it differ from HRM?
In short, HRM recurses multiple times through two small neural nets with 4 transformer blocks each (high and low frequency). TRM is much smaller (i.e., 4x) and only a single network with 2 transformer blocks.
TRM backpropagates through the full recursion once per step, whereas HRM only backpropagates through the final few steps. And TRM also removes HRM's extra forward pass for halting and instead uses a simple binary cross-entropy loss to learn when to stop iterating.
๐น Surprising tidbits
1. The author found that adding layers decreased generalization due to overfitting. And going from 4 to 2 layers improved the model fromย 79.5% to 87.4% on Sudoku.
2.ย Replacing the self-attention layer with an MLP layer also improved accuracy (74.7% -> 87.4% on Sudoku); however, note that this only make sense here since we have a fixed-length, small context to work with.
๐น Bigger picture
My personal caveat: comparing this method (or HRMs) to LLMs feels a bit unfair since HRMs/TRM are specialized models trained for specific tasks (here: ARC, Sudoku, and Maze pathfinding) while LLMs are generalists. Itโs like comparing a pocket calculator to a laptop. Both serve a purpose, just in different contexts.
That said, HRMs and the recursive model proposed here are fascinating proofโofโconcepts that show whatโs possible with relatively small and efficient architectures. I'm still curious what the realโworld use case will look like. Maybe they could serve as reasoning or planning modules within a larger toolโcalling system.
In practice, we often start by throwing LLMs at a problem, which makes sense for quick prototyping and establishing a baseline. But I can see a point where someone sits down afterward and trains a focused model like this to solve the same task more efficiently.