TheAgentic

TheAgentic

37 Photos and videos

Tweets

TheAgentic

@theagentic

Jan 8

Most teams are still shipping like it’s 2015: Build feature. Deploy feature. Hope it works. Except now the feature makes its own decisions. You can still move fast and break things. You just have to know exactly what breaks, why it breaks, and how to fix it when it does. The teams doing well in AI know: • where their agent gets confused • which prompts cause drift • how to reproduce every failure Roadmaps don’t save AI systems. Runbooks do.

235

TheAgentic

TheAgentic

@theagentic

18 Dec 2025

There are two fundamentally different ways an AI can answer a question: 1. predict what sounds right 2. follow explicit rules

319

more replies

TheAgentic

TheAgentic

@theagentic

18 Dec 2025

This matters in production. Healthcare, finance, law, and engineering don’t just need answers that sound right. They need systems that can prove when they’re right and fail loudly when they’re not.

220

TheAgentic

TheAgentic

@theagentic

18 Dec 2025

Bottom line: LLMs are becoming incredible interfaces. Symbolic systems provide guarantees. The next step isn’t choosing between them. It’s deciding where symbolic logic lives in the architecture.

168

TheAgentic

TheAgentic

@theagentic

14 Dec 2025

Sneak preview of our TAU-Bench results… more on this soon 🫡

181

elvis

TheAgentic retweeted

elvis

@omarsar0

14 Dec 2025

NEW research from Google on effective agent scaling. More tool calls don't always mean better agents. The default approach to scaling tool-augmented agents today remains throwing more resources at the problem such as more search queries, API calls, and more budget. But agents lack budget awareness and quickly hit a performance ceiling. This new research introduces BATS (Budget Aware Test-time Scaling), a framework that makes agents explicitly aware of their resource constraints and dynamically adapts planning and verification strategies based on remaining budget. Standard agents don't know how much budget they have left. Without explicit signals, they perform shallow searches and fail to utilize additional resources even when available. Simply granting more tool calls doesn't help because agents terminate early, believing they've found sufficient answers or concluding they're stuck. Budget Tracker is a lightweight plug-in that surfaces real-time budget states inside the agent's reasoning loop. At each step, the agent sees exactly how many tool calls remain and adapts accordingly. Results: Budget Tracker achieves comparable accuracy to ReAct with 10x less budget (10 vs 100 tool calls), using 40.4% fewer search calls, 21.4% fewer browse calls, and reducing overall cost by 31.3%. BATS goes further by making budget awareness shape the entire orchestration. A planning module adjusts exploration breadth and verification depth based on remaining resources. A self-verification module decides whether to dig deeper on a promising lead or pivot to alternative paths. On BrowseComp, BATS with Gemini-2.5-Pro achieves 24.6% accuracy versus 12.6% for ReAct under identical 100-tool budgets. On BrowseComp-ZH, 46.0% versus 31.5%. On HLE-Search, 27.0% versus 20.5%. All without any task-specific training. Budget-aware design produces more favorable scaling curves and pushes the cost-performance Pareto frontier, achieving higher performance while using fewer resources. It's all about wise-spending. Paper: arxiv.org/abs/2511.17006 Learn to build effective AI Agents in our academy: dair-ai.thinkific.com/

432

58,134

Rohan Paul

TheAgentic retweeted

Rohan Paul

@rohanpaul_ai

10 Dec 2025

A solid 65-page long paper from Stanford, Princeton, Harvard, University of Washington, and many other top univ. Says that almost all advanced AI agent systems can be understood as using just 4 basic ways to adapt, either by updating the agent itself or by updating its tools. It also positions itself as the first full taxonomy for agentic AI adaptation. Agentic AI means a large model that can call tools, use memory, and act over multiple steps. Adaptation here means changing either the agent or its tools using a kind of feedback signal. In A1, the agent is updated from tool results, like whether code ran correctly or a query found the answer. In A2, the agent is updated from evaluations of its outputs, for example human ratings or automatic checks of answers and plans. In T1, retrievers that fetch documents or domain models for specific fields are trained separately while a frozen agent just orchestrates them. In T2, the agent stays fixed but its tools are tuned from agent signals, like which search results or memory updates improve success. The survey maps many recent systems into these 4 patterns and explains trade offs between training cost, flexibility, generalization, and modular upgrades.

228

1,183

71,341

OpenAI Newsroom

TheAgentic retweeted

OpenAI Newsroom

@OpenAINewsroom

9 Dec 2025

OpenAI is co-founding the Agentic AI Foundation (AAIF) under the Linux Foundation alongside Anthropic and Block to support open, interoperable standards for agentic AI. We’re also donating AGENTS .md to help establish open standards that enable safe, reliable agents across tools, repositories, and ecosystems. openai.com/index/agentic-ai-…

OpenAI co-founds the Agentic AI Foundation under the Linux Foundation

OpenAI co-founds the Agentic AI Foundation under the Linux Foundation and donates AGENTS.md to support open, interoperable standards for safe agentic AI.

openai.com

216

542

4,066

517,193

Robert Youssef

TheAgentic retweeted

Robert Youssef

@rryssf

9 Dec 2025

The paper makes one thing painfully clear: Workflows ≠ Agents. A workflow follows a pre-written script. An agent writes the script as it goes, adapting to feedback and changing plans when the world shifts. This single distinction is why 90% of “AI agent demos” online fall apart in real interfaces.

1,857

TheAgentic

TheAgentic

@theagentic

10 Dec 2025

2025 showed that clever reasoning strategies can elevate smaller LLMs’ performance. For example, a 7B model augmented with Monte Carlo Tree Search (*r^ar-Math) achieved ~53% on the AIME math exam – placing in the top 20% of human high school contestants. Likewise, a logic puzzle solver using a structured reward (Logic-RL) more than doubled a small model’s accuracy on logic tests.

105

TheAgentic

TheAgentic

@theagentic

10 Dec 2025

These results hint that algorithmic techniques (search, iterative refinement, process supervision) can compensate for model size by instilling more stepwise, symbolic-like problem solving in neural nets.

TheAgentic

TheAgentic

@theagentic

10 Dec 2025

microsoft.com/en-us/research…

New methods boost reasoning in large language models

New techniques are reimagining how LLMs reason. By combining symbolic logic, mathematical rigor, and adaptive planning, these methods enable models to tackle complex, real-world problems across a...

microsoft.com