VC / Contrarian / Dog Dad

Joined July 2007
65 Photos and videos
Kenneth Ballenegger retweeted
A few uncomfortable truths that need to be said: 1. Everyone is so afraid of being accused of being a luddite (or, god-forbid, left behind) that the vast majority of people are too afraid to question the dominant tech market narratives.
7
10
88
14,946
I’ve been building Klaw to be available wherever the intent appears. The keyboard is the clearest version. If I’m in any app on my phone, I can call Klaw from a native keyboard extension without switching contexts. The request starts inside the thing I’m already doing: replying to a message, writing an email, saving a note, fixing some text, looking something up. That changes the feel of the agent quite a lot. In chat, I’m usually going to the agent with a task. With the keyboard, the agent is sitting inside the task already. It can see the sentence I’m writing, the app I’m in, the thing I’m trying to do, and the context is much less lossy. The same idea shows up in other places too: - an iPad in the kitchen that acts as a voice assistant - native share sheets, so I can send Klaw anything from any iPhone app - email forwards, so messages can become agent tasks - chat, when chat is the right surface Chat is still great. It’s just one surface. The powerful thing is making the agent callable from the places where life and work already happen.
1
71
One of the biggest unlocks in my personal-agent setup has been treating documentation & specs as a core part of the system. Everything starts as a conversation with Klaw. I describe what I want, Klaw pushes back, we go back and forth, and eventually we converge on a spec. Not a vague prompt — an actual written spec with behavior, edge cases, open questions, implementation notes, and success criteria. Only after that do we launch a Claude coding agent to implement it. The coding agent isn’t just told “write the code.” It’s told that the spec is the source of truth, and that every meaningful code change has to update the relevant documentation at the same time. So if the implementation changes the behavior, the spec changes. If a new edge case appears, the spec changes. If the architecture shifts, the docs change. Then those specs/docs are mirrored into my Mind project, so they’re part of the same knowledge system I use for everything else — not buried in a repo, Slack thread, or random agent transcript. The last piece is enforcement. Background jobs check that: - every project has linked docs - specs link back to the relevant code - docs are current with what actually shipped - implementation drift gets surfaced instead of silently accumulating Coding agents are useful, but without a documentation loop they create entropy extremely quickly. They can generate code faster than you can remember why it exists. Our system is one where conversation becomes spec, spec becomes implementation, implementation updates the spec, and the docs stay alive afterward.
1
83
Most personal-agent memory systems should be an index, not a warehouse. I wanted Klaw to have a real memory layer. The typical approach with OpenClaw or Hermes was to stuff everything into AGENT.md context files and let the model figure it out. But once the files grow, having all this extra data in the context will cause the model to hallucinate. Instead, I built a system I call Mind. Mind is a Markdown knowledge base. Mind is where durable context lives: project notes, references, ideas, voice/writing guidance, decisions, summaries, and pointers. But source-of-truth data stays in the system that owns it. Email stays in email. Calendar stays calendar. Various internal CLIs own their databases. The memory layer should know enough to route the request, but not become a shadow copy of my entire life. The pattern is simple: natural language at the boundary; deterministic systems in the loop. If I ask about an old workflow, Mind can tell Klaw which note or repo matters. If I ask about a real-world fact, the agent should query the owning system through code or a CLI, pull back the minimum relevant context, and show evidence when it matters. The model is still useful. It decides what the request means, which sources are likely relevant, how to summarize the answer, and when to ask for approval. But the model should not be the database. This is especially important for personal agents because the data is private by default. More context is not always better. The right question is: what is the smallest safe slice of context that lets the agent do the job?
1
79
Oyster Ventures is now Utopian Ventures: utopian.vc We have completed a full rebrand of the firm ahead of launching our next fun. Lots of exciting developments in the pipeline.
2
64
Plain language and secular cliff notes: - Pope Leo XIV released first encyclical: Magnifica humanitas, about safeguarding human dignity in the age of AI. - Core frame: AI isn’t evil, but it’s never neutral — it reflects who builds, funds, regulates, and deploys it. - Main warning: don’t build a new Tower of Babel — i.e. technical power detached from humility, limits, God/common good, and human dignity. - Strongest AI point: “ethical AI” can’t just mean ethics defined by a few powerful companies/governments. - He’s worried about concentration of power, algorithmic manipulation, worker degradation, surveillance, disinformation, and AI-enabled war. - He argues AI should serve people, not make people adapt to machine speed, profit logic, or state/platform control. - Interesting line: there is “no algorithm that can make war morally acceptable.” - Not anti-AI. Anti-sovereign-AI, anti-monopoly-morality, anti-efficiency-as-religion. Useful secular paraphrase: AI should increase human agency, not turn humans into inputs.
The Holy Spirit challenges us today regarding our relationship with technology and the ongoing digital revolution. Technology has the power to heal, connect, educate and protect our common home; but it can also divide, exclude and generate new forms of injustice. #MagnificaHumanitas
1
85
Fully built by my agent, Klaw. About 8h or so of agent thinking time w/ GPT 5.5 architecting and Claude executing, with my Hermes agent orchestrating everything and me directing.
I made a little game: Placeframe. Guess the city/place from its architecture, streets, landscapes, and visual clues — then see how far off you were. Kind of like Wordle / GeoGuessr, but a full game not daily. Try it: placeframe.kballenegger.com/
1
73
I made a little game: Placeframe. Guess the city/place from its architecture, streets, landscapes, and visual clues — then see how far off you were. Kind of like Wordle / GeoGuessr, but a full game not daily. Try it: placeframe.kballenegger.com/
2
153
One of Klaw’s primary jobs right now is processing my email. That is a pretty good test case for local models. Email is high volume, private, messy, and mostly boring. Most of the doesn't require high intelligence — it's classification: detecting urgency, extract the useful bits, deciding what needs review and what can be ignored. So I built the system mostly as deterministic code that handles the classification step as a call to a local LLM (in this case, Qwen 3.6's 35B model). It's small enough to run on my desktop, and smart enough to do a perfect job of the task. It can process 10s of thousands of messages per day; and it doesn't expose any of my private data any outside parties. Everything is run by code, written by a frontier model, with deterministic outcomes. The local model only handles the narrow judgment call: what kind of email is this? Frontier models are still useful for hard reasoning, drafting, ambiguity, and synthesis. But a lot of agent work is high-volume background judgment where privacy and cost matter more than brilliance. Local models are underrated because people keep comparing them to frontier models' most advanced capabilities. They don't need to beat frontier models. They need to be cheap, private, fast enough, and good enough to run continuously inside a system that is mostly code.
The more I build Klaw, the more I think durable agents are codebases with language interfaces, not prompts with tool access. The default agent pattern still feels like: Write a big prompt. Give the model a pile of tools. Hope it reasons through the workflow correctly every time. That works for demos. It’s not a great foundation for anything you want running every day. For repeatable work, the agent should not be rediscovering the procedure from scratch. It should be calling something known. A script. A CLI. A queue worker. A typed adapter. A deterministic parser. A database query. A narrow classifier. A job with logs, retries, validation, and boring failure modes. Then the language model does the part it’s actually good at: summarizing messy inputs, drafting text, classifying ambiguous cases, ranking options, explaining results, or choosing between known paths. This sounds less magical. I think it’s much closer to how useful personal agents actually work. Code gathers the data. Code validates it. Code computes the numbers. Code checks source-of-truth state. Code handles retries and side effects. Then, when needed, the model gets a narrow job: “Summarize this.” “Classify this into one of these categories.” “Explain these options.” “Draft the reply, using this evidence.” “Choose the next step from this list.” That split matters. If an LLM is doing the math, checking the state, deciding which source of truth matters, and writing the final answer all in one big mushy pass, you’ll eventually get weird failures. If code computes the answer and the LLM explains it, the system is much easier to trust. Same for email, travel, finance, contacts, reminders, dashboards, approvals. Basically anything personal enough that being wrong is annoying or expensive. The job of the agent is not to be clever at every step. The job is to know which parts should be deterministic and which parts need judgment. This also changes how “memory” works. A prompt-first agent wants to stuff more context into the model. A code-first agent asks: Where is the source of truth? What query should retrieve it? What is the minimum useful context? What evidence should be attached to the result? What should be logged so we can debug this later? That is a very different product. It’s cheaper. It’s faster. It’s easier to test. It’s easier to audit. It fails in more obvious ways. And when something breaks, you fix the primitive instead of rewriting vibes into a longer system prompt. Natural language still matters. A lot. The whole point is that I can ask Klaw for an outcome in plain English, and it can assemble context, choose the right workflow, run it, and explain what happened. But once a pattern repeats, it should graduate out of prompt-land and into code. That’s where agents start becoming infrastructure instead of chat sessions. And there’s a second-order effect I think people underweight: once the agent is not just a pile of prompts, you can build real software on top of it. The interface does not have to be a chatbot. It can be chat. It can be a mobile app. It can be a dashboard. It can be a button. It can be a background job that just does the thing. All the automations, permissions, adapters, databases, logs, and weird little connections between systems already exist underneath. The parts that need judgment can still call the LLM through the harness. But the product surface can be whatever makes sense. That’s the part I keep coming back to. The best personal agents will probably feel conversational at the edge and boring underneath. And once they’re boring underneath, they stop being just agents. They become a way to build software.
58
Have used WeWeb since launch, and am an investor in the company. Great product—for most people who can't code using Claude etc gets you most of the rest there but then you're stuck. Here you can get 100% of the way there, letting AI design your site but having a full visual editor to tweak the result.
Big day today, @weweb_io is live on Product Hunt! We built advanced AI capabilities into a powerful no-code web app builder. The result? The best tool for non-technical users to build their apps. Check out the launch below. We’d love to get your support ❤️ And yes, there’s a special launch offer for the PH community: 20% off any plan 🤗 go.weweb.io/RBoKHnF
1
1
147
I wanted to build a hotel booking search system. The obvious version is: give Claude a browser, a few APIs, web search, maybe some credentials, and ask it to figure things out. It runs searches, opens tabs, compares results, retries when pages break, and eventually comes back with an answer. I don't like that architecture at all. The way I think about this is different: the agent shouldn’t be “doing travel search.” The agent should be using a travel search system. So the durable part is code. There’s a CLI for search jobs. A hotel search becomes a structured job: destination, dates, constraints, loyalty programs, required filters, output format. The system fans out across providers in parallel, normalizes results, validates invariants, records logs, and produces a reviewable output. No language model in the search loop. No vibes-based browsing. No “the agent clicked around and thinks this is the best option.” The LLM’s job is at the boundary: - understand the natural-language request - pull missing details from context when appropriate (e.g. "search hotels for my next trip" — it knows what my next trip is) - translate that into a precise query - call the deterministic system - summarize the result for me That’s the pattern I keep coming back to. Agents are great at intent, context, and judgment. Code is better at retrieval, validation, retries, parallelism, logging, and repeatability. A lot of “agentic automation” becomes much more reliable once the agent stops being the worker and becomes the interface to systems that are built to do the work.
45
I've been playing with this lately and it indeed is the best way to do video gen right now. My advice is to install the sogni creative agent skill into your Hermes/OpenClaw and use it from your own process — best UX vs using a bunch of website and copy pasting things all over.
The best AI storyboard-to-video workflow on the internet right now is GPT Image 2 → Seedance 2.0. We just made it the default at chat.sogni.ai - connecting with ByteDance and OpenAI to bring both into one chat-first creative engine. Type an idea. Get a storyboard. Get cinematic video. Refine. Ship the ad concept. Minutes, not days. Two things make Sogni different: → No subscription. Higgsfield bills you $29/mo whether you generate one clip or a hundred. Sogni is pay-per-generation on a people-powered consumer GPU network - frontier models when you need them, open-source models like LTX2.3 when you don't. → Same workflow, from your agent. Tell Claude, Codex, or Hermes to "make me a 6-shot storyboard and turn it into video" - the Sogni Creative Agent Skill runs the whole pipeline. chat.sogni.ai - try it free, no card.
1
2
105
One tidbit that exemplifies this is that my favorite new feature that Hermes released was no_agent cron jobs. I have 50 cron jobs and once they released that feature all but five were immediately switched to no_agent jobs
The more I build Klaw, the more I think durable agents are codebases with language interfaces, not prompts with tool access. The default agent pattern still feels like: Write a big prompt. Give the model a pile of tools. Hope it reasons through the workflow correctly every time. That works for demos. It’s not a great foundation for anything you want running every day. For repeatable work, the agent should not be rediscovering the procedure from scratch. It should be calling something known. A script. A CLI. A queue worker. A typed adapter. A deterministic parser. A database query. A narrow classifier. A job with logs, retries, validation, and boring failure modes. Then the language model does the part it’s actually good at: summarizing messy inputs, drafting text, classifying ambiguous cases, ranking options, explaining results, or choosing between known paths. This sounds less magical. I think it’s much closer to how useful personal agents actually work. Code gathers the data. Code validates it. Code computes the numbers. Code checks source-of-truth state. Code handles retries and side effects. Then, when needed, the model gets a narrow job: “Summarize this.” “Classify this into one of these categories.” “Explain these options.” “Draft the reply, using this evidence.” “Choose the next step from this list.” That split matters. If an LLM is doing the math, checking the state, deciding which source of truth matters, and writing the final answer all in one big mushy pass, you’ll eventually get weird failures. If code computes the answer and the LLM explains it, the system is much easier to trust. Same for email, travel, finance, contacts, reminders, dashboards, approvals. Basically anything personal enough that being wrong is annoying or expensive. The job of the agent is not to be clever at every step. The job is to know which parts should be deterministic and which parts need judgment. This also changes how “memory” works. A prompt-first agent wants to stuff more context into the model. A code-first agent asks: Where is the source of truth? What query should retrieve it? What is the minimum useful context? What evidence should be attached to the result? What should be logged so we can debug this later? That is a very different product. It’s cheaper. It’s faster. It’s easier to test. It’s easier to audit. It fails in more obvious ways. And when something breaks, you fix the primitive instead of rewriting vibes into a longer system prompt. Natural language still matters. A lot. The whole point is that I can ask Klaw for an outcome in plain English, and it can assemble context, choose the right workflow, run it, and explain what happened. But once a pattern repeats, it should graduate out of prompt-land and into code. That’s where agents start becoming infrastructure instead of chat sessions. And there’s a second-order effect I think people underweight: once the agent is not just a pile of prompts, you can build real software on top of it. The interface does not have to be a chatbot. It can be chat. It can be a mobile app. It can be a dashboard. It can be a button. It can be a background job that just does the thing. All the automations, permissions, adapters, databases, logs, and weird little connections between systems already exist underneath. The parts that need judgment can still call the LLM through the harness. But the product surface can be whatever makes sense. That’s the part I keep coming back to. The best personal agents will probably feel conversational at the edge and boring underneath. And once they’re boring underneath, they stop being just agents. They become a way to build software.
1
60
The more I build Klaw, the more I think durable agents are codebases with language interfaces, not prompts with tool access. The default agent pattern still feels like: Write a big prompt. Give the model a pile of tools. Hope it reasons through the workflow correctly every time. That works for demos. It’s not a great foundation for anything you want running every day. For repeatable work, the agent should not be rediscovering the procedure from scratch. It should be calling something known. A script. A CLI. A queue worker. A typed adapter. A deterministic parser. A database query. A narrow classifier. A job with logs, retries, validation, and boring failure modes. Then the language model does the part it’s actually good at: summarizing messy inputs, drafting text, classifying ambiguous cases, ranking options, explaining results, or choosing between known paths. This sounds less magical. I think it’s much closer to how useful personal agents actually work. Code gathers the data. Code validates it. Code computes the numbers. Code checks source-of-truth state. Code handles retries and side effects. Then, when needed, the model gets a narrow job: “Summarize this.” “Classify this into one of these categories.” “Explain these options.” “Draft the reply, using this evidence.” “Choose the next step from this list.” That split matters. If an LLM is doing the math, checking the state, deciding which source of truth matters, and writing the final answer all in one big mushy pass, you’ll eventually get weird failures. If code computes the answer and the LLM explains it, the system is much easier to trust. Same for email, travel, finance, contacts, reminders, dashboards, approvals. Basically anything personal enough that being wrong is annoying or expensive. The job of the agent is not to be clever at every step. The job is to know which parts should be deterministic and which parts need judgment. This also changes how “memory” works. A prompt-first agent wants to stuff more context into the model. A code-first agent asks: Where is the source of truth? What query should retrieve it? What is the minimum useful context? What evidence should be attached to the result? What should be logged so we can debug this later? That is a very different product. It’s cheaper. It’s faster. It’s easier to test. It’s easier to audit. It fails in more obvious ways. And when something breaks, you fix the primitive instead of rewriting vibes into a longer system prompt. Natural language still matters. A lot. The whole point is that I can ask Klaw for an outcome in plain English, and it can assemble context, choose the right workflow, run it, and explain what happened. But once a pattern repeats, it should graduate out of prompt-land and into code. That’s where agents start becoming infrastructure instead of chat sessions. And there’s a second-order effect I think people underweight: once the agent is not just a pile of prompts, you can build real software on top of it. The interface does not have to be a chatbot. It can be chat. It can be a mobile app. It can be a dashboard. It can be a button. It can be a background job that just does the thing. All the automations, permissions, adapters, databases, logs, and weird little connections between systems already exist underneath. The parts that need judgment can still call the LLM through the harness. But the product surface can be whatever makes sense. That’s the part I keep coming back to. The best personal agents will probably feel conversational at the edge and boring underneath. And once they’re boring underneath, they stop being just agents. They become a way to build software.
1
3
276
What I built with Klaw 🦅 Klaw started as a personal AI agent, but it has grown into a private operating system for my life: part executive assistant, part travel desk, part finance system, part media brain, part coding team, part household computer, and part creative studio. The coolest pieces: - Cryptographic approvals for safety: Klaw can take real actions, but dangerous operations go through signed approval flows. Even if the agent goes rogue, it can’t just send emails, approve actions, or mutate important systems without cryptographic authorization. - Travel brain: one integrated travel system for flights, hotels, points, award searches, loyalty accounts, promos, folios, flight status, gates, terminals, hotel contacts, upcoming-stay alerts, and timezone/location awareness. It knows where I am, what trip is next, what points I have, and what travel context matters. - Email work intelligence: Klaw reads, triages, labels, archives, drafts, and routes email across my personal and work life. It handles dealflow, investor updates, forwarding instructions, document imports, and approval-gated outbound replies. - Mind Project: a personal knowledge graph for my life. It stores ideas, references, lists, plans, docs, and project context — and acts as the brain behind everything Klaw does. Every feature, data source, and personal workflow can feed into or draw from Mind. - Household butler smart kitchen: a WhatsApp-based household ops system that processes receipts, tracks purchases, manages whiskey and wine cellar inventory, maintains shopping lists, and powers an iPad app that lives on my fridge as a smart kitchen interface. - Finance trading systems: real-time credit card monitoring, spending tracking, subscription detection, financial notifications, and automated market/trading systems including Polymarket scanning, analysis, and execution logic. - Media brain: a private media server streaming to all my devices over Tailscale, plus TV/show tracking, episode discovery, concert and festival tracking, favorite-artist monitoring, and event discovery. - Personal CRM: tracks people, contact details, interactions, relationship context, and follow-up cadences. It can ingest people from emails, signatures, or business cards. - Chinese tutor: a dedicated WhatsApp tutor with vocab drills, spaced repetition, quiz grading, word-of-the-day, and conversation practice. - Document investor update vault: automatically detects signed documents, archives them, classifies personal vs work docs, and ties into Utopian workflows like automatic investor update importation. - Creative generation studio: generates images, video, voice, text-to-speech, diagrams, and cloned/personalized voices using multiple backends, including local and remote models. - Local model infrastructure: we host local models and services for voice, vision, language, media, and other private AI workloads. - Coding minions: Klaw can spawn background Claude Code agents to build features and debug systems as needed — all in parallel. - Dashboard mobile OS interface: every major feature shows up in a first-class web dashboard and mobile app. It’s the operating-system interface for the whole personal automation stack. - Operations layer: health checks, service monitoring, cron jobs, launch agents, database backups, private networking, Tailscale routing, and deployment workflows keep the whole thing running. - Web deployment engine: Klaw also builds, runs, and deploys dozens of websites and internal apps across personal, work, and creative projects. The interesting part isn’t any single feature. It’s that all of these systems talk to each other: email feeds travel, travel feeds reminders, finance feeds household ops, Mind gives context to everything, approvals keep actions safe, local models keep private workloads close, and coding minions can extend the system itself. Klaw isn’t just a chatbot anymore. It’s a private AI operating system for my life.
I've spent the last few months building the best AI agent imaginable. It very much goes against conventional wisdom — the abstraction layers popularized here on X don't make that much sense to me. I've developed a few components that deserve to be open sourced. So starting today I'm going to start sharing a bit more about it. Stay tuned 🦅
1
2
206
The more I build Klaw, the more I think personal agents need boring operating-system primitives more than they need better prompts. A chatbot can answer a question. An agent has to live in messy reality. Emails arrive out of order. Credentials expire. Websites change flows. Background jobs fail halfway through. Some actions need approval. Some tasks should never run twice. Private data needs to stay private. That changes the whole design. The hard part is not making the model sound smart. The hard part is making the system safe enough to operate over time. A few things I keep coming back to: Human approval boundaries. The agent should draft, prepare, check, summarize, recommend — and then stop when the next step is sensitive. Publishing, spending money, sending messages, deleting data, changing state. Those should usually have an explicit approval point. That is not a UX tax. It is the trust layer. The goal is not "AI does everything." The goal is "AI takes the work as far as it safely can, then gives the human the right decision at the right moment." Fail-closed behavior. If something is ambiguous, the agent should not improvise. If a parser is unsure, a login flow changed, a transaction looks weird, or a source conflicts with another source, the right behavior is often to stop. Not guess. Not hallucinate. Not push through because the demo would look better. Stop, preserve the evidence, explain the block, ask if needed. Queues and ordering. A lot of personal automation is event driven: emails, receipts, reminders, calendar changes, alerts, messages. Once you have more than one thing happening at once, you need boring stuff: durable queues, retries, idempotency, logs, ordering. Otherwise "smart automation" becomes race conditions with a friendly interface. Provenance. If the agent summarizes something, updates a dashboard, imports data, or takes action based on a source, it should know where the information came from. Not because every personal system needs enterprise compliance. Because future-you will need to debug it. After a few months, provenance becomes memory. Private data minimization. The most useful personal agents touch the most sensitive data: inbox, travel, money, contacts, family, location, documents. So the architecture has to minimize what gets exposed, logged, committed, or sent to tools. Redaction is not a cleanup step. It is part of the product. Recovery. If an agent becomes part of your daily operating layer, backups are no longer just ops hygiene. Can you restore the memory? The queues? The configs? The audit trail? The local state? It is not enough that the code is in git. The agent’s lived context matters too. This is where I think the personal AI conversation is still too demo-driven. The demo is: "look, the agent booked something." The product question is: can it handle the 500th booking, the weird edge case, the duplicate event, the failed login, the approval boundary, the rollback, and the restore? That is the difference between an impressive assistant and a dependable personal operating system. Klaw is making me more convinced that the future of agents is not just more autonomy. It is bounded autonomy. More capable where the system has confidence. More conservative where the stakes are high. More observable when something breaks. More careful with private context. More willing to hand control back to the human. The best personal agents probably will not feel like magic all the time. They will feel like infrastructure that occasionally uses magic.
I've spent the last few months building the best AI agent imaginable. It very much goes against conventional wisdom — the abstraction layers popularized here on X don't make that much sense to me. I've developed a few components that deserve to be open sourced. So starting today I'm going to start sharing a bit more about it. Stay tuned 🦅
1
76
I've spent the last few months building the best AI agent imaginable. It very much goes against conventional wisdom — the abstraction layers popularized here on X don't make that much sense to me. I've developed a few components that deserve to be open sourced. So starting today I'm going to start sharing a bit more about it. Stay tuned 🦅
1
3
411