Joined February 2009
254 Photos and videos
Pinned Tweet
The week that started with amazing @VoiceSummitAI finished with meeting the legend @garyvee. It couldn't have been any better ❤️ #VoiceFist is here to stay. Pay attention. Watch people's changing habits. And act to grab the first row seat. #VOICE19 @karol_stryja
6
5
52
Today, Anthropic cut off access to Fable and Mythos for nearly everyone on the planet. Overnight. If your first reaction was "that's an Anthropic story," you're missing the real lesson. If you run production-grade AI, provider independence isn't optional. Just like cloud, a platform you depend on can become unusable overnight - by a policy change, a ban, an export rule you had no say in. The teams that will be fine tomorrow are the ones who can swap a model provider in minutes, not hours or days. Can you? So the question worth sitting with this weekend isn't "which model is best." It's "what happens to my product if my model provider disappears tomorrow?" The actual answer should be: "not much". Provider independence means treating models as swappable components - a main reasoning model from one provider, fast filler responses from another, classifiers from a third, with the freedom to add a local model or an EU-hosted one for compliance whenever you need to. That's the principle we build on with Bonsai, our open-source platform for brand-safe AI agents. Not because any one provider is untrustworthy - but because depending on any single one is the actual risk. If you couldn't switch providers this week, you don't have a model problem. You have an architecture problem.
42
Every agency pitching an AI experience hears the same question from the client: "And what happens when it says something it shouldn't?" "Trust us" is not an answer. Showing them is. I recorded a short walkthrough of the Bonsai Playground - the part of our open-source framework where you test an agent before it ever meets a real user. In the video I have a live voice conversation with a lead-qualifier agent (also open source, on our GitHub), then open up everything that happened underneath: → What the agent knew at every turn - context transformers quietly collecting pain points, company size, timeline → A second agent (the Director) listening in, raising red flags and assessing the lead in real time → Guardrails, classifications, and which actions almost fired → Latency per turn - including why "got it" lands in 300–400 ms, so the caller never sits in silence That's the difference between a black-box demo and something you can put in front of a client and defend line by line. Your client's legal team can call it AI explainability. You can just call it knowing what your agent did, and why. We just shipped another Bonsai release, but the features aren't the point. The point is: if you're building branded AI experiences, you should be able to see inside them. Bonsai is free and open source. And if you're an agency or a brand with an AI idea your client keeps asking about, my DMs are open.
33
Hot take: voice-to-voice AI is the most impressive demo technology I've seen in years. It's also the last thing I'd deploy for a brand. I know, I know. The latency is wild. The naturalness is uncanny. The conference room reactions are chef's kiss. But there's a gap between "incredible demo" and production-grade brand technology. And voice-to-voice currently lives entirely on the wrong side of it. The core problem is steerability, or rather, the lack of it. These models process audio end-to-end, which means there's no clean seam to inject brand guardrails, no natural checkpoint to say "actually, don't go there." Brands need to control what their AI says, how it says it, and - critically - what it refuses to say. Voice-to-voice doesn't give you that. But wait. It gets better. IEEE Spectrum just published about a research showing that voice models can be hijacked via crafted audio inputs - specially engineered sounds that manipulate model behavior without the user (or the brand) ever detecting it. The attack surface isn't the prompt. It's the audio itself. So here's the current offer for brands: → No steerability → No brand safety guarantees → A new attack vector that bypasses every safeguard you thought you had Great for a demo. Genuinely terrible for a product. The brands that will win with voice AI aren't chasing the flashiest model. They're the ones who asked a boring, unglamorous questions first: → will it make sense for my customers? → is it solving a problem or only pretending to? → can we actually control this thing? That means structured journeys, guardrails that enforce rather than suggest, and full traceability when something goes wrong. (That's precisely why we built Bonsai) "We don't know why it said that" is not a brand crisis strategy. What's your take? Are brands being too cautious, or nowhere near cautious enough? IEEE article: spectrum.ieee.org/voice-ai-a…
1
38
Siri has been dying since 2011. Some notable "deaths" include - 2013: Google Now would make Siri extinct - 2014: Amazon Echo launched. Alexa was going to eat Siri's lunch - 2016: Google Assistant arrived. "Apple is years behind" - 2017: Cortana on iOS was going to replace it entirely - 2018: Google Duplex called restaurants for you. Siri was declared obsolete - 2019: Samsung Bixby was the "Siri killer" everyone was watching - 2022: ChatGPT launched. "Apple has no AI strategy whatsoever" - 2024: Apple Intelligence underwhelmed. "Siri is years behind" So now... have ChatGPT, Claude, and Gemini finally finished the job? - Cortana is dead. - Bixby is a ghost. - Google Assistant is gone. - Google Duplex quietly disappeared. - Alexa is fighting for survival inside Amazon. - Some "Siri killers" didn't even survive the decade. It looks like Apple is rebuilding Siri for iOS 27. Not just updating. Rebuilding. Google signed a deal to power Apple's new foundation models with Gemini. Apple's biggest AI rival is now building its engine. Siri isn't just a voice assistant. It's the AI interface layer between you and every Apple device you own. And that layer is just getting a brain transplant. The "ChatGPT will kill Siri" narrative is wrong. Gemini won't kill Siri. Gemini will power it. Maybe some other models too... In late 2024, a wave of tech analysts declared Siri "too far behind to catch up." Don't be them in 2026. It looks like Siri is very much alive. It just outlived a lot of its "killers". I am really looking forward to WWDC26. But in the voice community, we have been waiting for such Siri announcement for years...
72
So what does Voice AI actually cost per minute? Fair question. We get asked about that a lot. Annoyingly difficult answer. Because the bill is never just “the model”. So we added a Voice AI Cost Calculator to Bonsai website. You can pick your stack ↳ ASR / STT provider ↳ LLM provider ↳ TTS provider Then switch parts on and off ↳ voice generation ↳ classification ↳ transformation / processing And see the estimated cost per minute for a conversational AI setup. One funny thing we noticed while building it 👇 ASR pricing is often not where the big difference is. For many stacks, speech-to-text providers are close enough that they barely move the final number. The real cost swing usually comes from the LLM, the voice generation, and all the extra processing around every conversational turn. A demo can look cheap. Production has a way of finding the hidden line items. The calculator is not a crystal ball. It is a sanity check. Use it to compare provider combinations, pressure-test assumptions, and see where the money actually goes before committing to a stack. And if the result surprises you, tell me. That is usually where the interesting conversation starts. And if you want to be notified once we update the calculator with new models and prices, consider subscribing to the newsletter. Find it here: getbonsai.io/voice-calculato…
1
57
You measure funnels for your website. Your ads... But your AI agent — the one talking directly to customers? "It seems to be working fine." That's not measurement. That's hope. Bonsai 0.4.0 just shipped, and the headline feature is Analytics Funnels. Step-based funnel construction for conversation journeys. Built on action events — including guardrail triggers. Saved and shareable queries. Conversion rates. Dropoff points. If you're designing structured conversation journeys (Flows → Stages), you should be able to measure how people actually move through them. How do they convert? Where do the users drop? Where do the guardrails fire? The Observe aspect in Bonsai is getting real teeth. Also in 0.4.0: ↳ Ollama Provider - self-host your LLMs, auto-discover available models. ↳ Secrets Management - AES-256-GCM encrypted storage for provider credentials. No more plaintext keys sitting in your config. ↳ Extended Analytics - slice conversations by actions, variables, and user profiles. ↳ WebRTC audio -native media tracks and buffered PCM for cleaner voice sessions. Bonsai is open-source. More about Bonsai: getbonsai.io Release notes: github.com/utter-one/bonsai/… github.com/utter-one/bonsai-…
44
AI is becoming the voice of the brand. Which means "we added more prompt instructions" is not a serious control strategy. Why am I telling you this? Because the Bonsai website just went live. Bonsai is our open-source framework for building and operating safe, on-brand voice & chat agents. Now, about the name. A bonsai is not an app. It's a tree. And trees don't launch - they grow. We'll be shaping this one in public. New features. Real use cases. Pre-release peeks. Architecture notes. And the scars that led to most of the design decisions. If you want the signal without the algorithm roulette, there's a newsletter form on the site. That's where I'll share: ↳ Product updates ↳ Exclusive use cases and implementation examples ↳ Pre-release access to new features ↳ Knowledge articles for builders shipping brand-safe AI Before it gets diluted by the feed. No spam. No noise. If you're building AI that has to be on-brand, explainable, and governable - have a look. 👉 getbonsai.io
28
Bonsai 0.3.0 is out, our open-source platform to build AI agents and assistants. Just 18 days after 0.2.0. The team is on fire 🔥 Now, your agent meets customers wherever they are. WebSocket, WebRTC, SMS, WhatsApp, and Twilio Voice - all through one pluggable channel architecture. Build the agent once. Run it on the phone, on WhatsApp, on your website. Same logic, same guardrails, same analytics. The bit I'm also super excited about is what you can finally do with the data. ↳ Analytics Explorer Every interaction, human and AI, is an event you can slice. Which stage do users drop off at? Which classifier fires most? How much did that conversation cost, and why? Ad-hoc queries, drill-downs, saved views. Because "it felt good in testing" is not a production metric. ↳ Sample Copy system Brand voice control without prompt-wrangling. You build a library of approved responses, and the assistant weaves them in naturally. Stay on-brand without rewriting the system prompt every time the conversation design team changes direction. ↳ Execution Plan timeline Every classifier, extractor, and action rendered as a Gantt chart in the console. You see exactly what fired, when, why, and at what cost. ↳ Audit log with rollback Every change to your project is tracked — agents, prompts, guardrails, tools. Full version history. Roll back with one click. Because when something breaks at 11pm, "what changed?" should be a question with an immediate answer. Plus: per-model cost limits, stricter moderation modes, user banning, server-side VAD, and much more. Release notes in the comments. AI thrives on data. And you should be able to manage it. More information in the release notes: - github.com/utter-one/bonsai/… - github.com/utter-one/bonsai-…
47
Sneak peek of Bonsai 0.3 - our open-source framework for building brand-safe AI agents and assistants. 📞 Communication channels are making a way into the platform, with Twilio as one of the first providers. Ping me if you'd like to take it for a spin :) Or just try for yourself: github.com/utter-one/bonsai Examples: github.com/utter-one/bonsai-…
1
32
Bonsai 0.2.0 is out - our open-source framework for building safe, on-brand voice and chat agents. This release makes Bonsai much better at the unglamorous part of agent building: structure, tools, and control. Webhooks and scripts are now first-class Tools. Project import/export is cleaner. And the console now tracks full change history with character-level diffs, so you can see exactly what changed and why. We're also publishing the first example. Lead Qualifier is presently the clearest way to see Bonsai work in practice. It qualifies inbound leads through natural conversation, scores them against BANT, filters out bad-fit opportunities, and books a discovery call automatically. My favourite part is the explainability layer It surfaces Director Thoughts, red flags, and the reasons behind qualification decisions - so you can inspect the decision layer instead of trusting the AI on vibes. The complete example includes Bonsai project, n8n blueprints for meeting scheduling and a sample web app. Because when AI is talking to your potential customers, "sounds plausible" is not enough. Always best to try it out for yourself. Lead Qualifier sample project: - github.com/utter-one/bonsai-… Bonsai 0.2 Release notes: - github.com/utter-one/bonsai/… - github.com/utter-one/bonsai-…
1
38
Most AI lead qualifiers can talk. Very few can justify the decision. We’re about to release a Lead Qualifier use case for Bonsai, our open-source framework for building safe, on-brand voice and chat agents. Complete and for FREE. This pack shows what that looks like in practice. Not just a prompt, but a full build example with: ↳ A sample app to test the flow end to end ↳ Make scenarios for automatic meeting scheduling ↳ Complete Bonsai project with guardrails, the Director Whisperer pattern and built-in explainability: reflections, red flags, and why a lead was qualified or disqualified Because when AI is talking to potential customers, “sounds plausible” is not enough. This use case requires the latest version of Bonsai which we plan to release tomorrow. We extended Tool support and moved webhook integrations into Tools, which made the whole flow much cleaner to build. Want the pack first? Follow me and look out for the release post tomorrow.
30
8 years in the making. 3 months in the oven. And on Friday the 13th, against all odds, we soft launched Bonsai. Bonsai is our framework for building and operating safe, on-brand voice & chat agents. Because getting an LLM to talk is not the hard part. The hard part is building customer-facing AI that ↳ doesn't go off-script ↳ doesn't improvise your reputation away ↳ doesn't leave your team saying "we don't know why it said that" When AI is customer-facing, failures aren't bugs. They're trust incidents. Bonsai is our answer to that through ↳ structured journeys instead of one giant prompt ↳ guardrails as a product primitive ↳ auditability and explainability by default ↳ an improvement loop that turns production failures into stronger behaviour This is just a soft launch, so I won't unpack everything yet. More soon on the architecture, the product, the examples, and the scars that led to it. If you're building AI for brands or enterprises, I'd love to compare notes. Find the repo here: github.com/utter-one/bonsai I will be in London 22-26 Mar. Would love to meet up and show you how it works. DM me if you're around.
4
1
97
We’ve been building voice assistants since 2018. Production taught us fast that one smart prompt is not a framework. In production, you need structured journeys dynamic context, guardrails, and operational control. That’s why we built Bonsai. Bonsai is a framework for voice AI assistants, agents, and experiences that are ↳ on-brand by design ↳ safe and compliant ↳ explainable and auditable ↳ continuously improvable (issues -> evals -> better releases) It’s headless and provider-agnostic, so teams keep ownership and can ship across channels. And the best part... we are preparing to fully open-source it! If you’re building customer-facing Voice AI for a brand or enterprise and want this kind of foundation, let’s talk.
1
50
The barrier to entry for building software just hit near-zero. Does it mean all software is redundant, and we will not just generate software when needed? Not quite. Here is my latest story with an experiment. It is difficult to admit, but I still have to Google "SSH tunnel syntax" every single time. So while on a winter break, I decided to fix it. But instead of actually learning macOS development (I have better things to do in Spain), I decided to see if I could vibe code my way to a solution. And I ran it as a bit of an experiment. I used a vanilla instance of Codex App (with GPT-5.2-Codex). No custom agents.md, no skills, no prompt engineering gymnastics. I just described what I wanted to ChatGPT: a menu bar app to manage my tunnels and access those hidden admin UIs without the terminal headache. Then used the bigger description with Codex. It took about 4 hours total to build and polish a fully functional, native macOS app. I decided to go all in with Apple Developer ID and the irony is that it took another 24 hours (and two submissions) to get Apple to notarize it so it doesn't look like malware. I’m releasing the app as open-source. Not because it’s a technological marvel (it’s really not), but to prove a point. Would that be possible if there weren't a few frameworks already available? Definitely not. The app stands on the shoulders of giants like Tauri, React, Rust, TypeScript. BUT because they are available... ...the barrier to building the specific, niche tool you need is effectively zero. You don't need to be a "macOS developer" or a "Tauri expert" any more. You just need to be annoyed enough by a problem to spend an afternoon talking to an LLM. If I can accidentally build a production app while trying to avoid reading man pages, you have no excuse not to build the thing you've been thinking about. In the unlikely case of you finding such an app useful, here is the GitHub: github.com/xmstan/shafts
1
58
Discoverability is the new search. It was always the Achilles’ heel of conversational platforms. Neither Alexa nor Google Assistant ever solved it. The challenge is simple to describe, yet brutal to solve: 👉 How do you help a user find or provide the right thing in a world of almost infinite possibilities? OpenAI will soon reopen this question by allowing devs to build apps for ChatGPT When you ask, “Order a margherita,” how does it decide which of the thousand pizza apps to use? ↳ Proximity - closest kitchen or best cross-town option? ↳ Rating - public stars or your personal satisfaction signals? ↳ History - loyalty to the place you already love or even used before? ↳ Price & promos - cheapest today or best value over time? ↳ Latency & reliability - who’s fast and accurate right now? ↳ Availability - open hours, inventory, couriers actually online? ↳ Trust & safety - hygiene, refunds, delivery insurance? ↳ Paid placement - did someone sponsor visibility? ↳ Something else??? This is where ethics, experience, and economics collide. If the ranking policy isn’t explicit, we’ll get an SEO-for-agents era that everyone will try to game. One thing is almost certain. Brands will want to control the discussion when mentioned. They will want to be seen when relevant conversations happen. My bet: In the age of talking to computers - discoverability is the new search.
1
1
72
Planets are aligning at @OpenAI . Thoughts on a napkin after attending DevDay 2025 People in the #VoiceFirst world all remember how hard it was for Amazon and Google to get the experience right with Alexa and Assistant. Discoverability and monetization were never really solved and authentication was always a challenge. That is why the biggest announcement for me was Apps SDK. It builds on the voice-first promise of technology being available for you by just asking for it. Discoverability, monetization, authentication were mentioned only in passing, but the fact they were mentioned at such early stage means that someone may have done their homework. Will it fulfil that promise? I hope so. All I know is that the early excitement from 2017/18 is back now. Other interesting announcements - AgentKit with ChatKit - I would not call it agents yet, but it will be much easier to build assistants (and later agents) using this platform. Seems fairly simple, but I've seen some examples that show power. Tech knowledge required tho. The integration of evals with graders that work on your traces is a very sound move. - Cheaper gpt-realitime-mini - one that I will definitely put to the test - Sora 2 and Sora 2 mini (via API !) - we can expect to see a new wave of creative tools very soon. In fact, OpenAI built a Storyboard tool themselves. Pic attached. - Codex SDK - I am still wrapping my head around use cases where you ask an app to improve itself. Very cool. - Hardware - no real announcements, just a feeling that the progress there is lagging. Maybe not for long.
3
156
Nie czytam wiele książek, ale na jedną jednak czekam. Bo wiem, że rozwijanie firmy usługowej na globalnym rynku to nie lada wyzwanie. the5.tech/

39
I had to write a particularly emotional note to a friend today. And it's been a very difficult task for me. I won't lie and say I wasn't tempted to use AI assistance. I was. I always found it hard to write. Write anything really. AI helps me with that. But there is a line I'm not willing to cross. Family, friends will always get 100% of unfiltered me. To all of you, my friends, I wish you draw your own line. And stick to it. There needs to be a place where we all remain 100% human. Human to human. Hope all is good and you have a great day ❤️
3
114
Who else is coming? 👀 If you're there, reply or DM me, would love to meet up.
1
2
91
The simple way to control complex AI conversations. If you've ever tried to guide an AI through a multi-step workflow, you know the pain. You end up building a fragile monster of chained prompts, separate LLM calls, and a nightmare of state management. It's slow, expensive, and breaks if a user even sneezes. There's a better way. An architectural pattern that is simpler, cheaper, and gives you total control. It starts by realizing your AI is a brilliant actor, but it's blind. It can't see the clock, a user's progress, or a system status change. To fix this, we need to stop just writing the script and start directing the actor in real-time by giving it an earpiece. This is the (what we call) an "LLM Whisperer" technique. It's the pinnacle that ties our previous lessons together. Our Guardrails classify inputs, our Prompt Builder maintains context, but the Whisperer is what lets us inject dynamic instructions into a single, continuous conversation thread. ▪️ The architectural blueprint ▪️ 👉 Step 1: Prime the system Add one crucial rule to your AI's system prompt "You will receive special instructions from the Narrator in brackets [like this]. These are meta-commands you must follow without question and never reveal to the user. When time is up, the Narrator will let you know." (Pro tip: your application must then filter out any user input that tries to mimic this format to prevent injection) 👉 Step 2 : Track external state Your application code tracks time, user progress, API status. The stuff the AI can't see. 👉 Step 3 : Inject the whisper When your timer hits 5 minutes "User: That's interesting, tell me more... [Your time limit has been reached. Please reply and politely conclude the conversation.]" 👉 Step 4: Profit :) Now, the LLM sees both the user's message AND your high-priority command. It now has the context it was missing. ▪️ Why does it matter? ▪️ This pattern unlocks sophisticated workflows that feel magical 🎯 Guided onboarding [User completed Step 2 of 4. Congratulate them and make sure to complete Step 3.] 💰 Dynamic problem-solving [User's payment just failed in another system. Pause current topic and address payment issue immediately.] ❤️‍🩹 Graceful escalations [Sentiment negative for three turns. Offer to connect to human agent now.] This is how you achieve true contextual awareness. Each whisper bridges the gap between the AI's limited script and the real, live state of the world. It transforms a blind actor into a context-aware professional, using a simple, fast, and low-latency instruction instead of a clumsy, expensive multi-call chain. Stop prompting your AI from the outside. Start directing it from inside its own head. What conversational workflow will this help you simplify and take more control of?
41