Impressive survey on agentic reasoning for LLMs.
(bookmarks this one)
135 pages!
Why does it matter?
LLMs reason well in closed-world settings, but they struggle in open-ended, dynamic environments where information evolves.
The missing piece is action. This is because static reasoning without interaction cannot adapt, learn, or improve from feedback.
This new survey systematizes the paradigm of Agentic Reasoning, where LLMs are reframed as autonomous agents that plan, act, and learn through continual interaction with their environment.
It provides a unified roadmap that bridges thoughts and actions, offering actionable guidance for building agentic systems across environmental dynamics and optimization settings.
The framework organizes agentic reasoning along three complementary dimensions:
1. Foundational Agentic Reasoning: Core single-agent capabilities including planning, tool use, and search. Agents decompose goals, invoke external tools, and verify results through executable actions. This is the bedrock.
2. Self-Evolving Agentic Reasoning: How agents improve through feedback, memory, and adaptation. Rather than following fixed reasoning paths, agents develop mechanisms for reflection, critique, and memory-driven learning. Reflexion, RL-for-memory, and continual adaptation link reasoning with learning.
3. Collective Multi-Agent Reasoning: Scaling intelligence from isolated solvers to collaborative ecosystems. Multiple agents coordinate through role assignment, communication protocols, and shared memory. Debate, disagreement resolution, and consistency through multi-turn interactions.
Across all layers, the survey distinguishes two optimization modes: in-context reasoning (scaling inference-time compute through orchestration and search without parameter updates) and post-training reasoning (internalizing strategies via RL and fine-tuning).
The survey covers applications spanning math exploration, scientific discovery, embodied robotics, healthcare, and autonomous web research. It also reviews the benchmark landscape for evaluating agentic capabilities.
I have been looking closely at this area of research, and here are some of the open challenges that remain: personalization, long-horizon interaction, world modeling, scalable multi-agent training, and governance frameworks for real-world deployment.
Paper:
arxiv.org/abs/2601.12538
Learn to build effective AI agents in our academy:
dair-ai.thinkific.com/