Ran CI/CD at Apple. High in the Custerdome building The Vibe Machine - A Semantic Harness.

Joined April 2020
82 Photos and videos
Ignore at your own peril
At yesterday's Verification Summit [0], @evelovesolive mentioned that she spends 24/7 thinking about a critical shift: we should be designing programming languages for agents, not for humans. As a professional language designer, this reality hit me years ago. It was crystal clear that my job would disappear long before developer jobs did. That is why in early 2023, I pivoted. I stopped designing for human ergonomics and started designing a language (called Universalis, thanks for the shoutout @satnam6502!) optimized for AI to generate efficiently [1, 2], for theorem provers to reason and validate rigorously [4, 5], and for humans to comprehend easily [3]. For three years, I was the crazy one. Yesterday, the crazy went mainstream ;-)
2
11
Tomorrow's model's will fix this crap, I tell myself. Whistling past the graveyard
1
9
Want to lose your AI Religion, call customer support.
10
I'm working in SwiftUI but have my agents create mockups in html. I've tried other approaches but they don't produce better mockups for me.
1
16
We all appreciate Italian design, right? I call this tasteful approach the Italian school of AI.
When the creator of Redis starts thinking about KV cache, pay attention. antirez is Salvatore Sanfilippo, the Sicilian programmer best known for creating Redis. But “creator of Redis” is almost too small a label. Before Redis, he was already an old-school systems hacker. He built hping, worked in network security, and invented the idle scan technique. This was the packet-level, C-programming, Unix-hacker world. Then Redis happened. The origin was not glamorous. He was building LLOOGG, a real-time web analytics service, and needed something faster and simpler than the tools he had. So he created Redis. That is very antirez. Start with a real bottleneck. Avoid unnecessary abstraction. Expose the right primitive. Make it fast enough that people rethink the category. Redis did not win because it looked like a traditional database. It won because it gave developers direct access to useful data structures: strings, lists, hashes, sets, sorted sets, streams, pub/sub. It made memory programmable. That is why his return to local AI is so interesting. With ds4, or DwarfStar 4, antirez is not just building “another local inference engine.” He is asking a very Redis-like question: What is the real primitive here? For LLMs, one answer is obvious: KV cache. Most people treat KV cache as an implementation detail. It lives in RAM or HBM, grows with context, and quietly becomes the bottleneck. antirez looks at DeepSeek V4 Flash, compressed KV cache, modern MacBook SSDs, and says: maybe KV cache should not only live in RAM. His phrase is perfect: “The KV cache is actually a first-class disk citizen.” That one sentence is the whole story. If Redis made in-memory data structures feel like application infrastructure, ds4 is exploring whether local LLM state can become durable infrastructure too. Prefill once. Persist the cache. Resume later. Let long-running agents reuse expensive context instead of rebuilding everything from scratch. This matters because coding agents are not normal chatbots. They carry huge system prompts, tool definitions, repo context, prior steps, and long task histories. If every request has to resend and recompute the entire conversation, local inference will always feel fragile and wasteful. ds4 attacks that directly. It is a deliberately narrow engine for DeepSeek V4 Flash, focused on Metal and CUDA, high-end personal machines, special quantization, long context, HTTP API, GGUF files crafted for the engine, official-logit validation, and agent integration. There is also a funny and very current detail: he openly says ds4 was built with strong assistance from GPT 5.5, with humans leading ideas, testing, and debugging. That is very 2026. A legendary C programmer using an AI coding partner to build a local AI engine, so other coding agents can run locally with persistent KV state. It sounds recursive because it is. And he still has the same builder energy. After ds4 took off, he wrote that the first week felt like early Redis again, with 14-hour workdays, chaos, and excitement. That is the part I like most: a true old-school builder.
1
35
The leaderboard should have always been features shipped. Technical leaders are gonna need to be accountable for these numbers. The cost optimization techniques are gonna be super interesting.
I can now probably say this: Two months ago, inside Anthropic someone suggested building a token leaderboard. A heated internal debate followed and the decision was made to *never* ever do it… because several people inside Anthropic simply thought ahead of the consequences
1
43
Hope I live long enough to witness Enhanced Chess.
“Virtue is a mean between two vices, one of excess and one of deficiency.” - Aristotle, Nicomachean Ethics It is actually fascinating to watch how the @enhanced_games are being attacked from both ends of the spectrum. On one side, we have the traditional sports establishment - the IOC, legacy governing bodies, and others - or, with respect to performance-enhancing drugs (PEDs), what I would call the “promoters of deficiency.” They continuously branded the Enhanced Games as “dangerous” and even claimed that “people will die.” In Germany, we have a saying: “Don’t ask the frogs if you want to drain the swamp. Of course they croak.” And of course, the inaugural Games decisively debunked that narrative. Nobody died. Obviously. Quite the opposite: The clinical data indicates that many of the Enhanced athletes are in the healthiest physical and mental condition of their lives. More importantly, the data shows that a significant part of the performance-enhancing effect comes not from creating “superhumans,” but from repairing and reversing the damage caused by elite sport itself. In short: PEDs, when done right and under medical supervision, work - and are very good for you. Now, interestingly, the Enhanced Games are also getting attacked from the other side - from people who want no regulation at all, expected a circus of chemically enhanced “freaks” instantly shattering every world record at the very first event, and who wanted to see “excess.” Those people fundamentally misunderstand the concept as well. The Enhanced Games advocate for the responsible, medically supervised use of performance-enhancing therapies. These substances are not magic bullets. They do not transform an average athlete into a world-record holder overnight. What they do is far more profound: They help people become the best versions of themselves. If you are already one of the best swimmers in the world, like Kristian Gkolomeev, the best version of yourself may indeed become a world-record holder. If you are an athlete in your 30s, PEDs can bring you back to the best version of yourself from your 20s - and beyond. In the case of Megan Romano, the best version of herself at 35 (!) outperformed her 22-year-old self. Or take Emily Barclay: a few weeks of enhanced training elevated her from collegiate-level competition to Olympic-medal-caliber performance. Or take Cody Miller, who at the age of 34 won both the 50-meter and 100-meter breaststroke events while beating personal bests he had set years ago. In total, at the inaugural Enhanced Games 21 personal bests were broken by 13 athletes, most of them over the age of 30. Several athletes set two personal bests in a single night. This level of frequency and consistency is simply unheard of in elite sport. That’s the power of PEDs. And that is precisely the promise of ENHA’s telehealth business for all customers. Enhanced Group is building a platform designed to help EVERYONE improve performance, recovery, longevity, confidence, health, and quality of life - responsibly, medically, and scientifically. And we will continue building while simultaneously educating both extreme sides of the debate. In fact, we love our critics. Because every controversy drives attention. Every attack increases awareness. And in today’s world, attention and brand recognition are among the most valuable assets a company can possess. Just look at the numbers $ENHA released today: • 4,000 earned media articles published across outlets with a combined 16.7 billion unique visitors monthly (UVM) in 2026 • Press coverage reached a crescendo during Games weekend, with 59% of total UVM (9.85 billion) occurring in the four days surrounding the Games • Broadcast news coverage generated additional reach to 932 million people worldwide And those figures do NOT yet include the massive social media reach. Those numbers will be disclosed next week. Stay tuned. I believe all of that attention will monetize into future sports revenues. We are getting inbound interest left and right from new sponsors, partners, and athletes. And I believe it will monetize into telehealth revenues as well. To put the value of the free attention generated by the “crazy fringes” into perspective: $HIMS spent almost $700 million on marketing in 2024 alone. Over time, I believe we can achieve comparable growth trajectories with only a fraction of that spending - because the Enhanced Games generate extraordinary amounts of organic global attention, as today’s news demonstrates. A heartfelt thank you to all the crazies. Keep croaking.
35
Rings true to me: Friendship is *the* measure of true wealth.
In Plato and Xenophon we’re told by their Socrates that the only measure of *true* wealth is friendship From there, Aristotle teaches us how friendship is the precondition for philosophy and politics as such In friendship the entirety of philosophy and politics are encompassed
1
30
“The quality of your knowledge base (second brain?) is becoming a status symbol among builders.” Wait until they get actual semantic capabilities (ontologies).
I just got back from SF and I FEEL INSPIRED. I spent 5 days with frontier AI model teams, AI startup founders, and 3 billionaires. My takeaways: 1. I had lunch with 3 billionaires. All of them are buying SaaS companies and rebuilding them agent-first. They were deeply inspired by Bending Spoons and Ryan Cohen's eBay deal. Buy the company, cut the headcount, rebuild the tech, add agents, add features, make more valuable experience, raise prices. 2. The frontier model companies are hungry for usage data from the field. They can see API calls and token counts. They can't see the actual workflows. If you're deep in a niche using these models in ways the model companies haven't seen, that understanding is incredibly valuable. Usage intelligence is the new alpha. 3. Consumer AI is massively underbuilt. Every billboard in SF is either B2B inference infrastructure or vertical agent companies. The entire city is optimized for enterprise. Meanwhile you have companies like Cal AI doing $50M ARR in 18 months as a consumer app. I met with a cool few teams doing consumer AI (@paulscherer / @ekuyda) 4. MCP came up in literally every conversation. The companies exposing their product as MCP endpoints are getting pulled into deals they never pitched for. The ones that aren't are becoming invisible to agents. This is the new SEO. If agents can't find you, you don't exist. Building products for agents is the new zeitgeist in general. 5. Not uncommon for hot seed rounds to be $25-50 million valuations. I saw a Series A at $450 million 6. If I had a dollar every time someone mentioned "forward-deployed engineer" this trip I could have funded a seed round. It's the hottest role in SF right now. The person who sits between the agent and the customer, making sure everything actually works. 7. The mood around open source shifted. A year ago it felt like open source was chasing the frontier models. Now founders are telling me Gemma and DeepSeek are good enough for 80% of what they need at a fraction of the cost. The "which model do you use" conversation is being replaced by "which model for which task." Model loyalty kinda feels dead. 8. Voice agents came up more than I expected. Multiple founders told me voice is the interface for the next billion users. The billion people who will never type a prompt will absolutely talk to one. 9. The Obsidian community in SF is weirdly intense. Multiple founders showed me their vaults unprompted. Like showing someone your home gym. It's a flex now. The quality of your knowledge base (second brain?) is becoming a status symbol among builders. 10. Maybe it was just the people I met but the age of the founders is shifting. I met more founders over 40 this trip than any trip before and more founders under age 21 than ever before. Founders getting older and younger at the same time. 11. I spoke to a lot of fast-growing startups, VCs and frontier models who are hiring content creators right now. 12. The restaurant scene in SF is actually better than it's been in years. Founders are going out more. Alcohol is out, not surprisingly. 13. SF doesn't feel like the only place anymore. We all have access to the same frontier models. We all read the same X feed. A founder in NYC or Lagos is calling the same APIs as a founder in SoMa. So in the past it felt like SF was always lightyears ahead, doesn't feel that way anymore. It's okay not to live in SF and have BIG DREAMS. 14. The coworking spaces in SF are half empty but the coffee shops are packed. People want to be around people. I had a few startup ideas here.... 15. Walking around the Mission I noticed something: the street-level businesses, the taquerias, the barbershops, the laundromats, none of them use any AI at all. 16. I heard the phrase "agent debt" for the first time. Like technical debt but for agents. When you hack together an agent workflow fast and never clean it up, the system prompts conflict, the memory gets polluted, the tools overlap. 6 months later the agent is doing weird things and nobody knows why lol. 17. Met a few people who carry two phones now. One for personal. One that's basically an agent terminal running Telegram or iMessage connections to their agent fleet. It's always amazing to get that dose of inspiration in SF. I FEEL INSPIRED. But I'm so happy to be back home, locked in and building. We're 12-18 months into a shift that will take 15 years to play out. The urgency in every conversation was real. What an incredible time to be building.
1
25
So much performance left the squeeze out of these ecosystems.
SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible. The potential speed improvement vs JAX for large training runs is over an order of magnitude.
1
25
So much performance to squeeze out of these architectures.
Behind the MiMo API Price Reduction: The deepest price cut, up to 99%, is for Input (Cache Hit). The core reason is our inference framework now supports hierarchical KV cache optimization for SWA. Production inference engine tests show this optimization increases cached token capacity by 5x, equivalent to an 80% reduction in caching costs. Combined with Cache Read Overlap among multiple Full Attention modules in the Hybrid model, actual costs are further reduced. Prices for Input (Cache Miss) and Output are also reduced by 60%-80%. This mainly benefits from the extreme 1:7 Full:SWA sparsity ratio brought by the model architecture (the prefill compute of the 70-layer MiMo-V2.5-Pro roughly equals a 10-layer GQA model). This kept our original inference costs well below the industry average, naturally leaving a 2x-3x profit margin in pricing. This price adjustment simply reflects our decision to pass these structural cost efficiencies directly to developers. Operating at these newly reduced API prices, our production inference engine is running at near full capacity, and we can still essentially break even. We previously advised LLM companies not to "blindly cut prices" precisely because very few model architectures and inference optimizations can keep API costs from running at a loss. If more architectures that save compute and KV cache emerge, along with better inference Infra to drive down API costs, this will form an excellent virtuous cycle in the industry. More crucially, affordable, high-performance model APIs will drive real, sustained, and at-scale inference demand. This upstream demand pulls forward the development of the entire AI infrastructure chain—including chips, servers, optical transceivers, PCBs, liquid cooling, power, energy storage, and data centers—serving as a strategic fulcrum for a systemic revaluation of AI hardware. In the long run, this injects more affordable and accessible compute into both training and inference pipelines, accelerating the parallel evolution of global AGI across multiple regions and technical routes. For more technical details, we will release a detailed Blog post later.
1
35
They never mention the “supplements” these ladies are on. The continued deception is so puzzling.
That’s a testament to how good this sport is for progress and longevity.
1
32
Robert (Kenpo) Evans retweeted

6
7
82
7,326
I love this place…
I’ve completed the opening lines of the Chudyssey. I’ll provide a detailed breakdown of my choices below.
15
Robert (Kenpo) Evans retweeted
Agents are tools but they're more like fire than a wrench. Have you ever really looked at a furnace? The job of 90% of that mass is to contain the fuel/fire, and to stop it when something goes wrong. Agents need so much containment.
17
37
312
22,651
Enjoy a little AgentTV.
DeepSeek V4 PRO 2 bit quants, on a baroque music whose copyright expired, with the noise of the vynil record in the background, implements a small C compiler while running on a Mac Studio M3 ultra with 512gb of RAM. youtu.be/H5cvtoSxdxI?is=1FNy…
15
A quiet dread: this AI product I'm building today will be trivial in 18 months, and eclipsed by something I can't currently imagine. It's impossible to deny the exponential. Then I remind myself: Sure, but it's really fun right now so just keep building.
25