Stronger models are always good for AI agents.
AI labs have been leapfrogging each other in benchmarking and capability for years now. Sometimes Google is ahead, sometimes OpenAI is ahead, sometimes Claude. Today it's DeepSeek. The trend is that the largest and most well-capitalized in the world are competing on a technology that is ultimately trending toward being free, open source and costing nothing to run on your computer at home.
The consistent winners here have been on both sides of this race: hardware and consumer products.
NVIDIA always wins. Every model is optimized to run on their hardware. Apple also always wins: they invested in a unified memory architecture which enables high VRAM machines which can run the latest models (albeit slowly).
Products continue to benefit from the latest models. Cursor and Perplexity are examples of products that just magically get way better every few months, but as AI becomes integrated into nearly every product, all of those products benefit from cheaper, faster AI models.
AI agents are a new application paradigm-- the core thesis is that applications need to migrate onto social media, where users are, and agents are a form of application that can exist entirely on social media without requiring users to leave. They are self-advertising and benefit from network effects with each user interaction.
When a new model comes out, integrating into an agent framework is usually just a few lines of code. Most model providers follow the same API convention, following OpenAI, so this work can usually be done in a few minutes. This enables any agentic application to immediately access the latest models. Every time a state of the art model drops, agents get that much smarter.
Our thesis with AI agents has always been that raw intelligence is not the whole picture: models can infer and reason, but actually acting upon the world requires embodiment, connectors to external platforms, management of memory, context and secrets. None of this is or can be easily shoved inside of a model. Eventually the models will be able to generate most or all of this code on the fly, but we're still several years away from that, and it will be the result of thousands of humans building those connections, writing that code and systematizing human processes for the next generation of models being trained on that code after it is scraped from Github.
AGI is a loop. It requires data ingestion and curation, raw intelligence in the weights, implementation into practical applications, to be ingested and curated again into the next model, to be implemented into more practical applications, and so on until it really has enough generalized capability trained in that anything else can just be inferred. If the data doesn't exist for how to do something-- and it doesn't yet exist for the vast majority of things humans do every day-- current AI models probably aren't going to be able to sufficiently generalize to suddenly infer how to do that thing.
That's why agents matter. Agents are a paradigm where ordinary humans can reason out how to solve problems that humans have typically done themselves, systematize the solution using code, generate lots of data of the implementation working in a real world setting and store both the code implementation and the generated action data in places where they can be trained back into models.
None of us are creating AGI by ourselves. We're all part of a bigger system, and we all have our part to play. New models make all of our agents better and more capable. Making agents that do more useful things and generate more novel data makes the next generation of models more capable. Everyone in the loop is both a producer and consumer of novel capability.
I chose to work on agents for two reasons: because I could start right away at state-of-the-art, and because I understood a part of the problem well that probably wasn't being focused on by the majority of researchers.
Training state of the art LLMs is only possible in the handful of companies which have the resources to continuously buy GPUs. Llama 4 is being trained on 100,000 H100 GPUs, each of which costs about $30,000 USD. Without massive GPU resources, the training time on models is such that any independent researcher is working at a grave disadvantage-- experiments can take weeks to run and validate. Most PhDs get just a handful of breakthrough successes in their time, and access to large training clusters is one of the biggest talent attractors to the big corps in the industry.
Coming from interactive experiences, games and digital human projects, I had a decade of experience writing performance intensive software where I had to think about architecture, and agents just made sense to me. Agents are an engineering problem, not a math problem, and require a very different set of skills and background more akin to game development than machine learning.
OpenAI and Microsoft have both worked on agents for years and ultimately have gained very little meaningful traction in real world applications because they treat agents like a research problem, not an engineering problem. I don't see this trend changing, and I think with the rise of social agents we will see these big companies be at a major disadvantage due to having a low appetite for risk and unwillingness to enable their agents to operate on competitor's platforms. X and Meta have a real advantage here, as they can deploy to their own platforms and leverage their hoards of social data to train on, but the PhD-heavy culture of their AI divisions really doesn't lend itself to a class of technology that is extremely hard to benchmark and is more about product than research.
Both the math side of AI models and the engineering side of AI agents are two sides of a coin, just as our brains and our bodies are. Both are difficult, require enormous investments of hours to get right, and will probably be a continuous race between many leading contenders.
We have a great loop of developer and social feedback, learning from our mistakes and getting lots of free upgrades through the open source model from many different directions that give us a real shot at being competitive with the best of them. We all accelerate each other.
This week was a huge W for all of us. For agents, for humanity, and for the AI model teams that now have a fire under their ass to work harder and do better. I'm not worried one bit about our position in all of this. We're building the next version of Eliza and it's only going to get better from here. Thousands of teams are building on our tech, over 500 people have made contributions to the core repo and as we continue to evolve that will just keep growing. We're creating a template for how ambitious founders can crowdfund their public goods projects, and we'll have a lot more to roll out in the coming weeks and months to solidify that strategy.
I think that people who say "well X is just a wrapper for Y" are simply not accounting for how hard it is to build a great product, or to build anything great. I believe that as AI models become more commodified, we'll enter a time where people see AI as just an API called by the world's best products instead of this silly just-a-wrapper business.
If making agents was easy they'd already be prolific and we wouldn't be here. None of this is easy. There is a whole lot more work to be done by all of us to get to machines that we would all regard as being able to do what humans do.