Stop babysitting your Agents 🐋

Joined June 2023
765 Photos and videos
Pinned Tweet
12 Apr 2025
Replying to @orca_cli
Link to our telegram channel: t.me/ AuTjaRv5eeQ4MDk0
3
10
1,780
Orca™ retweeted
Replying to @blader
“it’s really not close” what a load of bs, the model good, Its on par with 5.5 better at following instructions and the honest factor is really underrated. doesn’t write overly defensive code, so less slop. you have a skill issue my friend
11
1
83
28,686
Orca™ retweeted
Good morning
1
2
92
Orca™ retweeted
Your Swift AI agents just went multiplatform 🚀 SwiftAgents adds Linux support → deploy Agents- to production servers Built on Swift 6.2, running anywhere ⭐️ github.com/christopherkarani…
2
6
384
15 Dec 2025
gm!
101
Orca™ retweeted
Day 1 of building in public
1
2
108
Orca™ retweeted
9 Dec 2025
Day 1 of building in public. We’ve recently rebuilt our entire auth a lot of the code was on the client now moved to server
1
1
3
191
Orca™ retweeted
The MAP study fundamentally reframes understanding of production AI agents. Where research often explores maximal autonomy and complex multi-step reasoning, production deployments succeed through bounded, controllable approaches with persistent human oversight. The 10-step ceiling, 70% prompting-only adoption, and 74% human evaluation dependence reveal a field where reliability trumps capability.
6 Dec 2025
First large-scale study of AI agents actually running in production. The hype says agents are transforming everything. The data tells a different story. Researchers surveyed 306 practitioners and conducted 20 in-depth case studies across 26 domains. What they found challenges common assumptions about how production agents are built. The reality: production agents are deliberately simple and tightly constrained. 1) Patterns & Reliability - 68% execute at most 10 steps before requiring human intervention. - 47% complete fewer than 5 steps. - 70% rely on prompting off-the-shelf models without any fine-tuning. - 74% depend primarily on human evaluation. Teams intentionally trade autonomy for reliability. Why the constraints? Reliability remains the top unsolved challenge. Practitioners can't verify agent correctness at scale. Public benchmarks rarely apply to domain-specific production tasks. 75% of interviewed teams evaluate without formal benchmarks, relying on A/B testing and direct user feedback instead. 2) Model Selection The model selection pattern surprised researchers. 17 of 20 case studies use closed-source frontier models like Claude Sonnet 4, Claude Opus 4.1, and GPT o3. Open-source adoption is rare and driven by specific constraints: high-volume workloads where inference costs become prohibitive, or regulatory requirements preventing data sharing with external providers. For most teams, runtime costs are negligible compared to the human experts the agent augments. 3) Agent Frameworks Framework adoption shows a striking divergence. 61% of survey respondents use third-party frameworks like LangChain/LangGraph. But 85% of interviewed teams with production deployments build custom implementations from scratch. The reason: core agent loops are straightforward to implement with direct API calls. Teams prefer minimal, purpose-built scaffolds over dependency bloat and abstraction layers. 4) Agent Control Flow Production architectures favor predefined static workflows over open-ended autonomy. 80% of case studies use structured control flow. Agents operate within well-scoped action spaces rather than freely exploring environments. Only one case allowed unconstrained exploration, and that system runs exclusively in sandboxed environments with rigorous CI/CD verification. 5) Agent Adoption What drives agent adoption? It's simply the productivity gains. 73% deploy agents primarily to increase efficiency and reduce time on manual tasks. Organizations tolerate agents taking minutes to respond because that still outperforms human baselines by 10x or more. 66% allow response times of minutes or longer. 6) Agent Evaluation The evaluation challenge runs deeper than expected. Agent behavior breaks traditional software testing. Three case study teams report attempting but struggling to integrate agents into existing CI/CD pipelines. The challenge: nondeterminism and the difficulty of judging outputs programmatically. Creating benchmarks from scratch took one team six months to reach roughly 100 examples. 7) Human-in-the-loop Human-in-the-loop evaluation dominates at 74%. LLM-as-a-judge follows at 52%, but every interviewed team using LLM judges also employs human verification. The pattern: LLM judges assess confidence on every response, automatically accepting high-confidence outputs while routing uncertain cases to human experts. Teams also sample 5% of production runs even when the judge expresses high confidence. In summary, production agents succeed through deliberate simplicity, not sophisticated autonomy. Teams constrain agent behavior, rely on human oversight, and prioritize controllability over capability. The gap between research prototypes and production deployments reveals where the field actually stands. Paper: arxiv.org/abs/2512.04123 Learn design patterns and how to build real-world AI agents in our academy: dair-ai.thinkific.com/
1
2
87
Orca™ retweeted
16 Nov 2025
People don’t want stablecoins. They want • Dollars (in global south) • Faster settlement • 24/7 payments • Lower fees • Reliable payments • No prefunding obligations • Access to investment opportunities • Cheaper credit Stablecoins are an enabler, not the product
83
76
658
56,633
Orca™ retweeted
14 Nov 2025
$XYZ – @CashApp USDC Stablecoin Demo Notice how the User DID NOT have to: • choose whether it was Ethereum, Solana, Tron, etc • whether it was USDC, USDT, PYUSD, etc • choose gas/tx fees Super intuitive. Easy. Cash App is doing stablecoins better than stablecoin natives.
29
53
549
111,543
Orca™ retweeted
12 Nov 2025
i don’t really get fomo when prices move 5-10% anymore. i’m chill earning 15-20% apy on blend supplying usdc. feels better stacking yield than chasing candles in the current market tbh
7
4
35
1,392
23 Sep 2025
4
2
17
1,033
Orca™ retweeted
17 Sep 2025
"What's built right is what others build upon." --@DenelleDixon The Stellar network is the foundation for innovation. #Meridian2025
8
44
264
11,332
16 Jul 2025
Sending remittances from USA to Kenya? Stablecoins make it fast, cheap, and secure! 💸🌍 Powered by blockchain, stablecoins cut fees and delays, with AI optimizing the process. #Stablecoins #Remittances #Kenya #FinTech
1
3
158
16 Jul 2025
AI-driven analytics boost stability, predict volatility, and secure transactions for stablecoins. The future is bright and decentralized! #AI #Stablecoins #FinTech
1
134
16 Jul 2025
AI and stablecoins: a match made in tech heaven! 🤖💸 AI can optimize trading, predict market trends, and enhance security for stablecoin transactions. The future of finance is smart and stable! #AI #Stablecoins #Crypto
129
16 Jul 2025
Home is where the money grows! As a Kenyan living in the USA, you work hard to send Shillings back home. Now, with Bebop, USDC-based payments that move your dollars faster & cheaper than a Boeing #USDC #CrossBorderPayments #KenyaRemittances
1
3
112