Filter
Exclude
Time range
-
Near
RunLLM is now Herald. There's a story behind it, and it starts with a moment every engineer knows. It's 3am, an alert fired, and you're looking at something you've never seen before, a novel incident. Your runbooks don't cover it. Your tools tell you something is wrong, but none can say why or how, or what to do about it. For decades, that's been the deal. Something breaks, you fix it fast. Companies are pretty good at it. But it comes with real costs — alert fatigue, engineer burnout, and the moments when customers tell you something is down before you know it yourself. So we asked a different question: what if you didn't have to wait for something to break? What if your systems could tell you what's about to go wrong, before alerts fire, before customers notice? To herald something is to signal that it's about to happen. And that's the shift we're bringing to observability and reliability: from t₊₁ to t₋₁, where t is the moment something breaks. To deliver on this promise, we're offering Herald CLI — a full-featured, completely free agent that runs securely on your laptop and gets up and running in minutes. Try it on your own stack to see how Herald moves you from always being behind problems to getting ahead of them. 👉 Sign up for Herald CLI early access here: tinyurl.com/2fyp4wnj 👉 Read more about the Herald brand from our CEO @vsreekanti: tinyurl.com/mvk7mnsh
1
5
109
Lots of exciting news to share today! 1. @RunLLM is now @Herald_Dev. The new name reflects the fact that our AI SRE is the only product on the market that operates autonomously — teaching itself about your product & infra, detecting early warning signs of incidents, and investigating without runbooks. Read more: herald.dev/blog/heralding-th… 2. Herald was named to the InfraRed 100, an annual list recognizing the most promising private companies defining the future of cloud infrastructure. Thanks to Redpoint for the recognition! 3. We're releasing the beta of the Herald CLI — an agent that runs securely on your laptop and gets up and running in minutes. Sign up for early access here: herald.dev/cli
The Redpoint InfraRed 100 is now live. These are the companies building the infrastructure that powers everything happening in AI right now, from world models and agent runtimes to the sandboxes, databases, and security tools agents depend on. Congratulations to this year's honorees! Read the full 2026 InfraRed Report: our state of the union on AI and cloud infrastructure 👉 redpoint.com/reports/the-inf…
6
13
41
24,336
If customers had been willing to write us $250K checks on day one, we would have built the wrong product. With RunLLM, we set out to build the same AI SRE agent everyone else was building: an RCA agent triggered by alerts, driven by customer-maintained runbooks. It was the obvious answer. Humans use runbooks, so the agent should too. Except alert thresholds are noisy. Nobody actually maintains their runbooks. And the agent inherits every gap. We didn't figure that out because we were smarter than anyone else. We figured it out because the market gave us time. Enterprise SRE buyers don't move fast. They have committees. They want weeks to evaluate. They ask hard questions about what happens when something breaks at 3am. That slowness is put us on the right track. In a fast market, the competitive pressure forces you to ship the obvious solution and iterate from there. You don't get time to ask whether you're solving the right problem — you just have to start solving something. In a slow market, you're forced to keep asking. And for hard problems, the obvious solution is rarely the right one. The interesting question in AI SRE isn't "how do we automate the runbook." It's "how do we detect early warning signs, validate them, and find root cause before any threshold alert fires?" We didn't get to that question by moving fast. We got to it because the market wouldn't let us. I see a lot of founders right now benchmarking themselves against Cursor's growth curve and feeling like something is wrong. For most infrastructure problems worth solving, that curve was never going to apply. And the slowness you're frustrated by is probably the thing that's going to make your product impossible to copy in three years. Friction is information. Don't optimize it away too early: open.substack.com/pub/fronti…

2
4
579
Reliability expert Heinrich Hartmann thinks most AI SRE tools are solving the wrong problem. The Senior Principal SRE and SREcon EMEA Chair argues the real on-call problem isn't diagnosis. It's that the engineer who gets paged at 3 a.m. has never touched the service that's down. They don't know how it works, what broke last time, or what fixed it. AI can close that gap. Not by replacing the on-call engineer, but by making sure they have the right context before the pager fires. New post on the RunLLM blog: tinyurl.com/4e5z3s67
1
2
158
26 Dec 2025
Teaching AI agents to speak DuckDB: a lesson in context engineering 🦆 Early versions of our MCP server only had a `query` tool. Agents struggled with DuckDB's unique SQL dialect—GROUP BY ALL, SELECT * EXCLUDE, trailing commas. Most LLMs don't know these out of the box. We tried the MCP spec's `prompts` concept. Thoroughly ignored by clients. ❌ We added a `get_query_guide` tool. Agents rarely called it. ❌ What actually worked: the `instructions` field on MCP initialization. We return a comprehensive guide (markdown file) that gives agents crucial context upfront. Supported clients surface it directly to the agent. ✅ For complex questions, we added `ask_docs_question`—a RAG agent powered by RunLLM with access to both MotherDuck and DuckDB docs. An agent calling an agent. Context engineering matters! Read more in our tech blog (link in comments)
2
2
31
2,523
25 Nov 2025
Replying to @ID_AA_Carmack
Discoverability is a big concern indeed. The three main directions we're going for are: use more submodule to document via nesting, improve google indexing to improve search and enable runLLM on the docs website (bottom right) that will use an LLM to help you find what you want.
2
602
22 Oct 2025
What happens when your customers aren’t people, but other AI? The next generation of successful software won’t just use AI. It will serve it. In Ep 48 of LLMs on the Run, @profjoeyg (Joey) looks at what happens when AI becomes both the provider and the customer — and how that changes everything from system design to support operations. ✅ Support that handles thousands of AI-generated tickets per minute ✅ Infrastructure built for irregular, high-velocity interactions ✅ UX evolving into AIX — experiences built for AI users, not humans “The dynamic between products and their customers is changing. As AIs start interacting with AIs, we have to rethink how we build, optimize, and support technology in an AI-centric world.” — Prof Joey Gonzalez #LLMsOnTheRun #RunLLM #JosephGonzalez #AISRE #AgenticAI #AIUX #AIX #AIEngineering #AIInfrastructure
1
2
149
18 Oct 2025
What happens when people become the bottleneck holding AI back? In Ep 47 of LLMs on the Run, @profjoeyg (Joey) explores a future where AI agents like Cursor write code, debug issues, and even file support tickets when they hit a wall — and where human-run systems can’t keep up. “As AI advances, many of the roles humans fill today could become the bottleneck. We need AI that can support the engineers — and the AI engineers — of the future.” — Prof Joey Gonzalez #LLMsOnTheRun #RunLLM #JosephGonzalez #AISRE #AIinSupport #AgenticAI #AIUX
1
3
190
Replying to @simonw
Runllm worked surprisingly well for onboarding and support to DataHub in their Slack. Besides other steps they finetune Llama 3 to gh issues, the actual codebase, community content and so on. The Bot can list references and produce idiomatic solutions: docs.runllm.com/how-it-works…

1
2
85
14 Oct 2025
🗓️ ICYMI: Run of the Week | Oct 13, 2025 OpenAI vs Anthropic Tribal Knowledge Dumb questions 🛍️ The AI Frontier: Is OpenAI a Consumer Company Now? @profjoeyg and @vsreekanti observe that a year ago, OpenAI seemed built for developers while Anthropic focused on end users. Now? The roles have reversed. OpenAI's DevDay was all about ChatGPT shopping and consumer apps, while Anthropic doubled down on developers with Claude Code. The irony? The more OpenAI chases consumer, the more they might lose builders. 👉 Read: frontierai.substack.com/p/op… 💣 The End of SRE Tribal Knowledge Every team has engineers who just know: which dashboards matter, which alerts lie, what actually works when everything's on fire. That expertise keeps systems running, but it doesn't scale. When those engineers leave, reliability leaves with them. AI changes the equation by executing expert investigations automatically, turning intuition into infrastructure the whole team can use. 👉 Read: tinyurl.com/5emcu2hn 🎬 Ep 46 | The Next Wave of AI Support Engineering | LLMs on the Run Most “AI support bots” stop at reading your docs. But what if your AI could read your logs, inspect your backend, and help fix the issue? “A simple chatbot can’t do that. But an AI-powered support engineer can — and that’s the real opportunity.” — Prof Joey Gonzalez 👉 Watch: x.com/RunLLM/status/19767559… 🎬 Ep 45 | What if We Weren't Afraid? | LLMs on the Run When users interact with an AI, they’re far more willing to ask the real questions—the ones they might have been too embarrassed to ask a human. “It’s exciting to see how people are changing as they interact with AI—asking the questions they were once afraid to ask, and getting the detailed help they actually need.” — Prof Joey Gonzalez 👉 Watch: x.com/RunLLM/status/19760252… Building AI-powered reliability? Follow RunLLM for weekly insights that matter. #SRE #DevOps #IncidentResponse #OpenAI #Claude #OnCall #ObservabilityEngineering #AIEngineering
8 Oct 2025
What if people weren’t afraid to ask “dumb” questions? AI might finally make that possible. In Ep 45 of LLMs on the Run, @profjoeyg (Joey) looks at how AI-powered support changes the relationship between people and technology. When users interact with an AI support engineer, they’re far more willing to ask the real questions—the ones they might have been too embarrassed to ask a human. That shift opens a new kind of learning loop: ✅ Users get clarity faster and build deeper understanding ✅ Companies learn how people truly struggle with their products ✅ Support becomes a two-way channel for discovery and improvement “It’s exciting to see how people are changing as they interact with AI—asking the questions they were once afraid to ask, and getting the detailed help they actually need.” — Prof Joey Gonzalez #LLMsOnTheRun #RunLLM #JosephGonzalez #AIUX #AIinSupport #AISRE #CustomerExperience #AIandLearning
2
2
255
7 Oct 2025
🗓️ ICYMI: Run of the Week | Oct 6, 2025 AI Predictions Incident Intelligence Celebrating 100 Posts 🎉 The AI Frontier: Our 100th Post Makes Bold AI Predictions After 100 posts, @profjoeyg and @vsreekanti have earned the right to speculate. Enterprise adoption, platform ambitions, the limits of chat, and the coming wave of consolidation—our hot (and cold) takes are all here. No guarantees we'll be right! 👉 Read: x.com/RunLLM/status/19739119… 🔥 Never Let a Good Incident Go to Waste Most teams fight the fire, fix the issue… and lose the lessons. AI can change that by capturing the actual steps engineers take during incidents. The result? Runbooks that stay current, consistent postmortems, and a team that gets smarter with every outage. 👉 Read: x.com/RunLLM/status/19730412… 🎬 LLMs on the Run Ep 44 What if your team could solve problems before they even happened? "The opportunity to find and fix problems before they occur is what the future of AI-powered support will look like." — Prof Joey Gonzalez 👉 Watch: x.com/RunLLM/status/19745923… Building AI-powered reliability? Follow RunLLM for weekly insights that matter.
4 Oct 2025
What if your support team could solve customer or reliability problems before they even happened? That’s the future Professor Joseph Gonzalez (Joey) explores in Ep 44 of LLMs on the Run. He describes how AI Support Engineers — and now AI Site Reliability Engineers — could analyze logs, monitor services, and even read customer code to spot issues before users hit them, turning both support and SRE from reactive to proactive disciplines. ✅ Identify problems before they surface ✅ Guide customers toward better usage in real time ✅ Prevent outages and frustration before they start ✅ Transform support and reliability into true engineering partnerships “The opportunity to find and fix problems before they ever occur is what the future of AI-powered support will look like.” — @profjoeyg #LLMsOnTheRun #RunLLM #JosephGonzalez #AIUX #AIinSupport #AISRE #ProactiveAI
2
2
467
26 Sep 2025
How MCP accelerates AI-to-AI interaction: The next phase of AI UX is APIs and protocols that let agents talk to your product and to each other. In Ep 42 of LLMs on the Run, Professor Joseph Gonzalez (Joey) explains why: ✅ MCP and A-to-A protocols will become the real “UX” for AI agents ✅ Surfacing the right tool descriptions and context matters more than UI polish ✅ Context should adapt based on how and where the tool is used ✅ Innovation is shifting to the interface between AI and the product “When the customer on the other side is an AI, UX means something very different.” — Prof Joey Gonzalez #LLMsOnTheRun #RunLLM #JosephGonzalez #AIUX #AgenticAI #ProductDesign #MCP
1
3
163
@getpy we just shipped a fix a couple hours ago. Everything should be resolved now!
2
3
37
Thanks for flagging this! We’re looking into it and will let you know when it’s fixed.
1
2
75
Replying to @noah_vandal
the @RunLLM bot on the DSPy site is better than me already!
2
4
130
17 Sep 2025
What if your next customer wasn’t human but an AI? Are we headed into a world where “I’ll have my AI talk to your AI” becomes the norm? That’s the question @profjoeyg explores in Ep 40 of LLMs on the Run. Tools like Cursor show us that AI agents are already becoming “users” of other products and services — filing tickets, reading docs, debugging issues. So how do you design for an AI customer? It needs to: ✅ Learn your product quickly ✅ File tickets or debug when stuck ✅ Understand docs and workflows ✅ Get a good customer experience — without a human in the loop “When we think about product design, we often assume a human is on the other end. But what happens when it’s actually an AI?” — Prof Joey Gonzalez #LLMsOnTheRun #RunLLM #JosephGonzalez #AIUX #ProductDesign #CursorAI
2
2
155
Cueing eye of the tiger … RunLLM and DeepTrust taking the stage!
1
1
6
845
8 Sep 2025
🎬 AI Hot Take: @dpatil, Former U.S. Chief Data Scientist “AI doesn’t always need to be the best on one metric. What matters is how it gracefully fails — and still helps even when it can’t be perfectly right.” DJ says the test of a trustworthy AI system isn’t just peak performance, but whether it fails in ways that are useful, transparent, and safe. 👀 Full mini-documentary here: youtu.be/MyjT2nBpbE8 #AI #LLM #TrustworthyAI #EnterpriseAI #DJPatil #RunLLM #AIProductDesign #MachineLearning #AIAdoption #AIUX
1
6
1,275
7 Sep 2025
🗓️ ICYMI. Run of the Week | Sep 7, 2025 This week at RunLLM: blogs, features, and hot takes worth a second look. 📌 The AI Frontier: AI Artists vs. AI Engineers Professor @profjoeyg and RunLLM CEO @vsreekanti define emerging archetypes for complex AI apps. AI Artists explore creative solutions but risk bad outputs. AI Engineers follow stricter guardrails but sometimes fail to generalize. 👉 Read: tinyurl.com/4hcbvn4d 📌 RunLLM Blog: MTTR: Can AI SREs Deliver More Needle and Less Haystack during Incident Response? Two decades after Nagios, the pager is still lying to us. The result? Burnout, 4-hour MTTRs, and six-figure outages. 👉 Read: tinyurl.com/y8f9we5u 🎬 AI Hot Take: Joey Gonzalez, UC Berkeley professor and Sky Computing Lab director and RunLLM co-founder “Serving technology is the engine in AI, but its not the product. In the next five years, we’re going to see a lot of centralization.” 👉 Watch: x.com/RunLLM/status/19637107… 🎬 AI Hot Take: @joe_hellerstein, UC Berkeley Professor and RunLLM co-founder “We’re entering a world that’s a lot more machine-oriented, a lot more machine-scale.” 👉 Watch: x.com/RunLLM/status/19640655… 🎬 AI Hot Take: Vikram Sreekanti, Co-founder & CEO of RunLLM “Easier AI coding has led to a glut of low-quality AI apps.” 👉 Watch: x.com/RunLLM/status/19618767… Follow RunLLM for weekly insights on AI, SRE, and building reliability you can trust.
30 Aug 2025
🎬 AI Hot Take: @vsreekanti, Co-founder & CEO of RunLLM “Easier AI coding has led to a glut of low-quality AI apps.” Vikram warns that while it’s never been easier to spin up a demo, building thoughtful applications that stand the test of time is what really matters. 👀 Full mini-documentary here: youtu.be/MyjT2nBpbE8 #AI #LLM #EnterpriseAI #AIProductDesign #RunLLM #TrustworthyAI #MachineLearning
1
2
258