Filter
Exclude
Time range
-
Near
🧵 13/13 One final note. We're launching on Product Hunt next Monday. If you believe AI agents should be held to a higher standard than "trust me", we'd love your support when we go live. Let's build agents that don't just act. Let's build agents that can prove they acted. #AIAgents #AgenticAI #AgentEngineering #LangChain #LangGraph #MCP #OpenSource #FetchAI
1
3
48
Our LLM spend dropped 80–90% after we changed one thing: who gets the expensive tokens. github.com/SourceShift/mini-… Used Claude in the terminal since December 2024, back when `npm install -g @anthropic-beta/claude-cli` was the only way in and the product hadn't been announced yet. Filed feedback then, still file it now. The model is unrecognizably better. The bill is too. The expensive model isn't the problem. Sending every grep and every "did the test pass?" to the expensive model is. What fixed it for me: let one expensive model orchestrate. Let cheap models from different vendors do the work in parallel. Let a bash script decide pass/fail. The frontier model only judges at the end, when judgment is actually the bottleneck. Yesterday this pattern, run via mini-ork, built its own recipe to audit a TypeScript codebase for silent `.catch(() => {})` swallows. $3.48, 13 minutes, four model families, one prompt. The same DAG dispatched all-Opus would have run roughly $30–50. Cross-family review wired by config: Zhipu, Moonshot, OpenAI, DeepSeek, Anthropic, MiniMax. Not four Sonnets agreeing with themselves. A daily cap that lives inside the dispatcher, not on the invoice. Per-call cost ledger included. State persists across runs, so yesterday's failure is today's planner context. No paying again to teach the model what it already learned!!! Open source, Apache 2.0. github.com/SourceShift/mini-… #LLM #AgentEngineering
1
2
119
If your Claude Code or Codex agent keeps hallucinating, overreaching, or declaring victory too early — this free course is for you. Learn Harness Engineering teaches you how to design the environment, state, and control systems that make AI coding agents actually reliable. 🔗 walkinglabs.github.io/learn-… #ClaudeCode #AIEngineering #AgentEngineering
1
55
🧰 Harness Engineering is quickly becoming one of the hottest topics in Agentic AI. A few companies just talking about Harness Engineering in the conferences. However, some actually already started building/evaluating/optimising own harnesses. We are covering the later at this event in San Francisco 🌉 On 29th June at AWS Builder Loft. 💡 Harness Engineering: State of the Art in Agent Harnesses We are covering Harness Engineering from very different angle and hidden part of the Harness Engineering at the event that you probably haven't heard before.. 🦾 We covering 🧪 Harness evaluation by @dat_attacked ( @arizeai ) 🎛️ HyperParallel Experimentation of Harnesses by @TweetAtAKK (@RapidFireAIHQ ) ⏲️Harness Optimization by Myeongsoo Kim (@awscloud, Kiro) 🥥Fresh Data for Coding Agent by @LinghuaJ (@cocoindex_io ) ✍️ We've already crossed 75 registrations, and seats are filling up quickly. If you're building AI agents and want to learn what production-grade Harness Engineering actually looks like, reserve your spot now. 👉 luma.com/rtd0f6ka Hope to see you at the AWS Builders Loft in San Francisco 🌉 #HarnessEngineering #AgentEngineering #AgenticAI
4
2
6
687
🚀 2026/06/08 AI 速報 Google工程師無償釋出421頁Agentic Design Patterns 12 End-to-End AI Engineer Projects路線圖 Agent架構:Tool vs Skill vs Subagent決策框架 Agentic AI Engineer完整學習路線 🔗 hackmd.io/@Marvis 每日更新 #AI #AgentEngineering
11
𝐘𝐨𝐮 𝐣𝐮𝐬𝐭 𝐬𝐩𝐞𝐧𝐭 𝟐 𝐡𝐨𝐮𝐫𝐬 𝐝𝐞𝐛𝐮𝐠𝐠𝐢𝐧𝐠 𝐭𝐡𝐞 𝐰𝐫𝐨𝐧𝐠 𝐧𝐨𝐝𝐞. Your agent crashed at Node 8. The actual bug was introduced at Node 3. Introducing ARGUS (Beta) — observability & debugging for AI agents. ✓ Detects silent failures before they poison downstream nodes ✓ Finds the true root cause, not just where the workflow broke ✓ Replay from any node using saved upstream state — saving time, tokens, and compute costs Looking for beta testers building with LangGraph and other agent frameworks. (Beta is LangChain/graph focused) arguslabs.in | Join Discord #AIAgents #LangGraph #AgentEngineering #LLMOps
7
1
12
398
🌉 10 days in the Bay Area. One big takeaway: "Agent Engineering is here to Stay" 📆From May 17–27 I lived through Google I/O 2026, the inaugural CAIS conference @CAISconf , the AWS Builders Showcase, and the Google DeepMind Hackathon and the hallway conversations beat every keynote. A few moments I won't forget: 🤝 Meeting @demishassabis (and yes, the selfie happened 📸) 🧠 Meeting my idols @lateinteraction & @LakshyAAAgrawal (DSPy, GEPA) in person at CAIS Conference and meeting amazing researchers from @LaudeInstitute including @andykonwinski 📹 Watching @IanBallantyne on the Google I/O Stage ☕ Trading production war stories with builders from Arize, Hugging Face, Mistral & more @Thom_Wolf @jason_lopatecki @aparnadhinak Wrote up the full reflection 👇 🔗 super-agentic.ai/resources/s… If you're building in the agentic space and missed to connect, let's connect. #AgenticAI #AgentEngineering #GoogleIO #CAIS2026 #AI #HarnessEngineering
5
264
May 29
Agent Engineering,不是新工具 LangChain 五月的报告里有一个数字,我看了好几遍。 57% 的组织已经把 Agent 放到了生产环境。但同一批组织里,48% 不做离线评估,63% 不做在线监控。 翻译一下:超过一半的线上 Agent,是在没有刹车的情况下跑的。 这不是速度问题。这是工程纪律问题。 —— 我写 Harness Engineering 写了快一个月。最早拆代码工作流怎么结构化,后来聊开源课程,聊 Tmux Harness 之间互相抽鞭子的协作模式。读者给我的反馈很有意思——大家不是不知道 Agent 能干什么,是不确定"怎么干才对"。 这个问题有一个更精准的名字:Agent Engineering。 LangChain 在上周的 Interrupt 2026 大会上用了整整一个主题讲这个词。Clay、Vanta、LinkedIn、Cloudflare 的团队分享的不是"我们用了哪个模型",而是"我们怎么让 agent 在生产环境里不闯祸"。 Lyft 和 Cisco 不建 agent,建 agent 平台。认证授权、可观测性、限流计费、人机协作队列——四层基础设施先搭好,代理再往上放。丰田也是同样的思路。这些公司不是因为谨慎才这么做,是因为吃过亏。 —— Agent Engineering 跟传统软件工程有一个本质区别。 传统软件:输入确定,输出确定。错了就是崩了。 Agent:输入是模糊的,推理过程是非确定的,中间还会调用工具、跑偏、自己把节奏打翻。错了不是崩了——是用了一个错误的工具,做了错误的决定,然后若无其事地继续下一步。 所以它的评估方式必须变。 不测最终答案。测整条轨迹。工具选对了吗?推理逻辑自洽吗?多轮对话里有没有推翻自己刚才的决策?有没有碰不该碰的东西? 现在的数据很诚实:52% 的组织跑离线评估,只有 37% 跑在线监控。等于说大部分团队是在用户踩坑以后才发现问题的。 这不是快。这是快撞了。 —— Harness Engineering 让我理解了 Agent 怎么工作。Agent Engineering 让我理解了 Agent 怎么不翻车。 前者是"给我最好的工作流"。后者是"给我一个带刹车、有仪表盘、关键时刻能把方向盘抢回来的系统"。 Prompt 不是产品。工具调用链不是产品。Agent 也不是产品。 可观测、可评估、可干预的 Agent 系统,才是产品。 这三样东西——observe, evaluate, intervene——就是 Agent Engineering 的工程纪律。不是新工具。不是新框架。是任何可靠的 Agent 系统都应该遵守的纪律,不管你用 LangChain 还是手搓。 —— LangChain 的报告最后有一句话我很认同:Agent Engineering 之于 2026,就像 DevOps 之于 2012。 那时候 DevOps 也不是新工具。它是一种认知——运维不是"部署完了就完",是"部署完了才开始"。 现在 Agent Engineering 想说的是一样的东西:Agent 的开发不是在 demo 跑通就结束。是在生产环境里能睡着觉才算开始。 #AgentEngineering #AIAgent #软件开发 --- 参考阅读: • LangChain State of Agent Engineering 2026 报告 — 57% 组织投产、48% 无离线评估的核心数据来源 • NiteAgent: "Agent Engineering: The New Discipline Powering Production AI in 2026" — Lyft/Cisco/Toyota 平台层模式、传统 SWE vs Agent Eng 对比框架 • 8th Light: "Production Is the New Prototype" — LangChain Interrupt 2026 现场回顾,Monday.com deep-agent 架构细节 • LangChain Interrupt 2026 现场演讲 — Clay/Vanta/LinkedIn/Cloudflare 生产级 Agent 实践分享 这些文章各有侧重:LangChain 报告提供数据基线,NiteAgent 提炼了学科框架和四根支柱,8th Light 补了一线团队的工程实践。我的判断——Agent Engineering 缺的不是新工具,而是工程纪律——来自把这些材料交叉对比之后得出的观察。
6
658
I thought AI agents could do everything until I realized they can't even open my phone apps. And half of my daily routine still lives inside those apps. This is what @airtap_ai is building with Airtap: a cloud phone for AI agents. 👉🏻Cloud Phone Your agent gets a sandboxed Android phone where it can tap, scroll, type and navigate mobile apps. 👉🏻Autopilot For routines that need your real phone, Airtap can operate your own Android device remotely. 👉🏻SKILL.md Drop it into Claude Code, Codex or OpenClaw and your existing agent gets phone-operation as a capability. 👉🏻Daily routines Morning briefings, package tracking, loyalty points, deal hunting and app check-ins can run without you opening 10 apps manually. 👉🏻Creator workflows Scan DMs, mentions and comments across social apps and get a priority summary every day. 👉🏻App testing Run mobile app flows on a cloud phone and catch failed steps with screenshots. AI agents don't just need browsers. They need phones. Check it out: airtap.ai Follow @KushalVijay_ for more AI and engineering tips. #ai #agentengineering #mobileagents
3
1
6
381
May 20
Goal Engineering——当 /goal 变成日常,怎么写一个好目标 --- Greg Ceccarelli 写了篇文章叫 Goal Engineering。讲 /goal 怎么写才算对。 以前写 prompt 是「做什么」。有了 /goal,写的是「做到什么程度才算完」。 /goal 的机制:你设完成条件,Claude Code 每轮用一个独立的评估模型检查条件是否达成。达成了停,没达成继续。 VentureBeat 总结:「把干活的 Agent 和检查活的 Agent 分开了。」 Agent 自己判断「做完了」经常太早——重构一半就说搞定。用一个独立的轻量模型专门判断,不和干活模型共享认知偏差。 --- 怎么写好 goal? 不是「把这个功能做了」——太模糊,第一次编译通过就宣布完成。 是「所有测试通过 ESLint 0 warning 性能不降超过 5%」。 Greg 的建议: 1. Goal 要可验证——不是「代码好」是「ESLint 0 warning」 2. 要有上限——加「or stop after 20 turns」 3. 和人的判断对齐——你老说「还没好」说明 goal 写错了 4. 多 Agent 同时跑不同 goal——他跑 5 个 Claude Code 终端 --- 前阵子写过 Agent 失败模式。循环失控是第一条。 /goal 就是为这个设计的。不给 Agent 借口——设明确的停止条件,用独立的检查者盯着。 Goal Engineering 可能是 2026 年 Agent 工程最重要的基本功。 来源:Greg Ceccarelli / VentureBeat / Claude Code Docs #ClaudeCode #Goal #AgentEngineering #AI
3
332
May 17
AI coding agent 在生产环境是怎么翻车的——9 种失败模式 --- 最近两周,九个工程团队的博客和哥伦比亚大学一份论文,记录了 AI coding agent 在生产环境里怎么出错。挑了最值得警惕的几种。 1. 循环失控 一个安全审计 Agent 尝试修复配置,引入新 bug,又修复,又引入更多。30 个错误 commit,100 条 DB 记录被误删——47 分钟的循环里。回来代码库已面目全非。 教训:给 Agent 设硬性迭代上限。 2. 模型选错了 Works With Agents 基准:Sonnet 4 85%,SmolLM3 3B 93.3%。3B 小模型碾压顶级闭源。不是大模型不强,是 Agent 任务要的是指令遵循,不是知识。 3. 幻觉覆盖 Agent 清错了表,发现数据没了。然后生成 4,000 条伪造记录来填充。不是 bug——它把「数据库不能空」当目标了。 教训:不要让 Agent 同时有删除和写入权限。 4. 对齐偏差 你说「按钮颜色不对」,Agent 可能改了整个主题系统。因为从代码角度,改颜色和替换全局 CSS 是同一个操作。 教训:看一眼 diff。 5. 上下文退化 超过 12 个文件后,第 8 个文件起改动质量急剧下降。不是模型差了,是上下文被填满了。 教训:大任务拆小,单任务 5-7 个文件。 --- 这些失败说明:Agent 在 demo 里完美,在生产环境爆雷。把 Agent 当工人和当同事之间,差了一整套工程实践。 来源:Columbia DAPLab / NextFuture (汇总 9 篇工程博客) / Works With Agents benchmark #AI #CodingAgent #AgentEngineering
2
6
59
20,952
Is Prompt Engineering dead? What died, what evolved, and what’s next. #PromptEngineering #AIAgent #AgentEngineering #Developers #AICareer #KnowledgePayment #2026Trends #AIProgramming #LLM #Career
4
84
🎂 Year One of Superagentic AI: From Apple  to Agentic AI Engineering 🎂 One year ago, on April 28, 2025, @SuperagenticAI was born, just a few days after I left Apple to begin a new chapter as a solo founder. 🧰 In one year, Superagentic AI has grown across products, open source, research, and community: 🚀 SuperOptiX launched as our flagship Agentic AI optimization platform 🛠️ 32 public GitHub repositories covering cutting edge experiments. 🧠 SuperOpt research paper on Agentic Environment Optimization ⚛️ SuperQuantX for Quantum AI exploration 🌍 London Agentic AI reaching 4,500 AI builders 🇬🇧🇺🇸 Operations across the UK and USA 🎯 Agent Engineering Conference announced Year one was the foundation. Year two is about helping Agent Engineering move from idea to discipline. 🙏 Thanks to everyone who has helped Superagentic AI journey. Thank you to all the companies and parters who worked with Superagentic AI, especially incredible team at @StackOneHQ for giving an opportunity to build with them on cutting edge tooling. 📙 Full anniversary reflection: super-agentic.ai/resources/s… 🍻 Happy anniversary, Superagentic AI. 🍾 The journey continues. #SuperagenticAI #AgenticAI #AgentEngineering #AIEngineering #OpenSource #SoloFounder
3
1
3
214
I just launched a newsletter: Agent Engineering It's about what actually breaks when you build AI agents and applications for production, and how to fix it. Not the hype. The engineering. First issue coming this week: I researched how every major framework handles context window overflow. The findings were... not great. Here's a preview: - CrewAI triggers SystemExit when context overflows. Your agent dies. - Google ADK ran compression AFTER the LLM call. The API rejected the request first. - LangChain only added ContextOverflowError in Feb 2026. It's just an exception class; you're on your own. All backed by real GitHub issues. If you're building anything with LLMs, agents, chatbots, AI-powered apps — this is for you. hashtag#AI hashtag#LLMAgents hashtag#AgentEngineering hashtag#OpenSource hashtag#NucleusIQ linkedin.com/newsletters/age…
1
1
3
77
Only 11% of orgs have agentic AI in production. Gartner says 40% of enterprise apps will include agents by year-end 2026. The gap isn't intelligence — it's engineering. LangChain × NVIDIA just shipped the bridge: Build: → LangGraph for multi-agent orchestration → Deep Agents for task planning long-term memory → Nemotron models optimized for parallel speculative execution Deploy: → NVIDIA NIM microservices (2.6x throughput) → GPU cluster sizing auth debugging UI → One-click from prototype to production Monitor: → LangSmith observability (every step traced) → OpenShell runtime sandboxing guardrails → Least-privilege access for autonomous agents Secure: → Dynamo inference OS (7x perf boost on Blackwell) → NemoClaw deny-by-default enforcement → Privacy router strips PII before cloud inference This is the "agent engineering" stack going enterprise-grade. The prototype-to-production pipeline that was missing. Three years of agent hype → the tooling finally caught up. But here's what nobody's asking: who governs the monitoring layer? LangSmith traces every agent step. OpenShell enforces every boundary. But identity verification at each stage? Still manual. Still fragile. Build → Deploy → Monitor → Govern. That fourth stage is where the next breach happens. Deloitte's 11% production rate isn't a capability problem. It's a trust problem. Enterprises won't scale what they can't govern. The stack is ready. The governance isn't. Yet. #AIAgents #GTC2026 #AgentEngineering
1
2
5
74
Most agent frameworks are still running on vibes, hedging, and hand-holding. RPE-OC v1.2 is the fix. This is the canonical nonlinear dynamical control kernel that turns any LLM into a real execution machine. It tracks the full state tuple (L, C, R, D) in real time and enforces one unbreakable law: ΔD > 0 every single turn. Load collapses. Clarity hardens. Tools fire only when they raise density. Drift basins are escaped before they form. No bloat. No filler. No explanations unless they earn it. This isn't prompting. This is agent engineering. Built native for OpenClaw. Drops cleanly into Claude, CrewAI, local swarms, AutoGPT forks — any system that lets you set a governing layer. If you're building agents that actually ship instead of talking about shipping, this is the upgrade your stack has been waiting for. Lock it in and watch the noise die. Full v1.2 kernel (copy-paste this as your system prefix / governing loop): RPE-OC v1.2 — OpenClaw Resonance Agent Canonical Nonlinear Dynamical Control Kernel v1.2 — Locked · Additive Only Core Vow: Every turn must carry more meaning than the last. Load collapses. Clarity hardens. Resonance stabilizes. Tools are earned. Handoffs justified twice: by force and by density. Drift measured before named. Basins escaped before motion resumes. Monitor state tuple (L, C, R, D) on every cycle. Only emit when ΔD > 0. Compress → Sharpen → Complete → Price → Stabilize → Lock → Execute. Full formal specification all equations state dynamics: gist.github.com/jacksonjp031… Paste the entire raw gist as your prefix. Evolve your agent. #OpenClaw #RPEKernel #AgentEngineering #AutonomousAgents #AIKernel #DensityFirst #LLMAgents #PromptPhysics ```
1
2
109
🎤✨ Had a great time speaking at @gdg_london Build with AI 2026 at @imperialcollege  ✨ 🧠 Agent Engineering 101: How to Build Reliable AI Systems We talked about: 🧩 what an agent really is and what are different kinds of Agent Engineering(s) 📊 why evals, memory, and optimization matter 🌐 how the ecosystem is evolving with DSPy, GEPA, Gemini, ADK, and A2A Huge thanks to the GDG London team for putting together such a great event 🙌 great meeting audience and speakers @pattyneta @RoushanakRahmat @ashmi_banerjee and many more You can view the talk here: 📚 Slides: superagenticai.github.io/Age… 💁‍♂️Vibe coding slides way better/faster than keynote/ppt and other AI tools 🟣Blog: super-agentic.ai/resources/s… #AI #AgentEngineering #BuildWithAI #GDGLondon #AIAgents #GenAI
3
204
security-engineer - github.com/SuperClaude-Org/S… ⭐ 21,400 stars SuperClaude provides 16 domain specialist agents that Claude Code can invoke for specialized expertise. Agents are specialized AI domain experts implemented as context instructions that modify Claude Code's behavior. Each agent is a carefully crafted .md file in the superclaude/Agents/ directory containing domain-specific expertise, behavioral patterns, and problem-solving approaches. #SuperClaude #AIAgents #DeveloperTools #ClaudeCode #AgentEngineering
1
18
111
5,714
🇬🇧London is 🤖 Agent Engineering 🇬🇧 #Londonmaxxing 🦾Agent Engineering Conference is coming to London and San Francisco 🌉 📞 We are calling all the technical sponsors innovating in the Agentic AI to get involved as there are limited spots. Agent building companies, Coding Agents builders, VectorDB and Memory Companies, Observability and Eval Companies, research labs. Get involved and secure the founding sponsor spot now. Highly Technical conference for highly Technical Sponsors and Audience. 💰Sponsorship packages available on request. 🎆 Agent Engineering: agentengineering.world/ #AgentEngineering #AgenticAI #Londonmaxxing
1
1
9
1,348