Deploy AI with Confidence

Joined September 2023
81 Photos and videos
Kick off your AI analytics flywheel with DBNL. DBNL transforms traces into actionable insights, ensuring teams can continuously fix and improve their agents with the right evidence in production: dbnl.io/cPRvZf
11
Is your AI agent actually calling a tool when it claims to? Scott Clark explains one of the most common anti-patterns he sees with multiple models today, where the agent gets "lazy" and will say it called a tool but the traces prove otherwise. Learn more about how DBNL's analytics helps teams catch dark patterns: dbnl.io/iMRLJz
12
Every day, DBNL users get fresh insights on their most recent production data. These insights drive the development of new offline metrics to guide reward functions, evals, or context for agents. They can be used to generate new test cases that are run pre-production on agents or in a continuous “integration testing” style setting. And they create new metrics to be monitored real time by more traditional observability tooling. Learn more about how DBNL analytics generate actionable insights: dbnl.io/cPRvZf
14
Your agent handles 50,000 traces a day. Accuracy is 87%. Error rate is stable. To your dashboard, everything is fine. Meanwhile: → A retrieval path is silently returning 90-day-old documents to users asking about recent policy changes. → A prompt tweak from last Tuesday shifted tool-calling order in a way that tanks task completion for multi-step workflows by 12%, for four hours, every Tuesday. → A whole topic category the agent answers confidently is wrong 40% of the time. None of this shows up in your metrics. You find out when a user complains, or when satisfaction scores quietly erode over a month and you can't explain why. Between your monitoring dashboard and your trace viewer, there's an analytical gap. Learn how to convert agent traces into behavioral insights with DBNL. Connect your OTEL traces and get insights in minutes: dbnl.io/Luuawb
19
Distribution analysis answers the questions that summary metrics can't. DBNL's distribution comparison can help you pinpoint the issue—this is especially helpful when evaluating experiment variants: dbnl.io/CjZ9X9
1
30
AI agents are complicated, but catching and fixing issues with them is easy with DBNL. From pre-deployment evals that fail to perform in production to insights on user intent and more, DBNL helps teams identify issues faster: dbnl.io/uVtqmf
21
When you sample traces, you’re making a bet that the interesting patterns are distributed randomly across your data. They’re usually not. Random sampling will systematically misses low-frequency, high-impact patterns. Where does your organization fall on the observability maturity scale? Read and find out: dbnl.io/i00yOD
1
22
Your P95 latency doubled after a deployment, but your average cost-per-call looks fine. Which one do you trust? This is the core tension in LLM agent observability: summary metrics tell you something changed, but not what changed—or for whom. For example, a bimodal latency spike (where half your requests are fast and half are stuck) looks identical to a uniform slowdown when you compress it to a single number. DBNL's distribution comparison unlocks the details. Here's how it works: dbnl.io/CjZ9X9
1
53
AI agent analytics insights and recommendations—automated. DBNL automatically turns logs into action items, helping you identify where your agents could use some tweaks. Learn more: dbnl.io/IvoBGJ
1
1
63
Agent breaking in production? There are 3 types of issues that DBNL's analytics can help catch before they become a bigger problem: 1. Offline, pre-deployment evals that fail to perform out of sample in production 2. Insights on user topics, intent, and input patterns 3. Issues in the complexity of agent behavior dbnl.io/uVtqmf
21
Are you missing key information about your AI agents? Here are the 3 failure modes that are invisible to aggregate metrics: → Behavioral drift by segment → Unknown-unknown behavioral patterns → Gradual degradation vs. hard failures Take our quiz to see where you land, then learn how to close the gap with DBNL: dbnl.io/arGD1z
10
Your evals are passing. Your agent is still failing. This is one of the most underappreciated problems in production AI systems. Distributional CEO Scott Clark breaks it down in this episode of the @twimlai podcast with @samcharrington : youtube.com/watch?v=ZqehXrVl…
25
Get a quick picture of where your AI agent's behavior has gone off course. DBNL makes it easy to identify irrelevant ouput in your logs, which you can then use for offline prompt iteration, reinforcement learning, fine tuning, tool changes, or hyperparameter optimization. Here's how it works: dbnl.io/uXZKvW
17
Most AI teams can see their aggregate metrics, but they can’t see the behavioral patterns inside them — which is where the real informations. Here's why most AI teams today are still "flying blind" when it comes to observability, and how to close the gap: dbnl.io/arGD1z
15
Imagine a Maslow’s hierarchy of observability for your AI agents: Telemetry = Logging what happens Monitoring = Alerting on known signals Online analytics = Surfacing the unknown unknowns Where do you land? Learn how to get to the top in this episode of @twimlai: youtube.com/watch?v=ZqehXrVl…
1
25
Distributional retweeted
Great summary from @DrScottClark! Building a deep understanding of agents to continuously improve them is the focus of @dbnlAI. I love the part around 18:45 - 20:45. Rich analytics driving increasingly automated improvement flows seems like a pattern that will harden in the near term.
In this episode, @DrScottClark, co-founder and CEO of @dbnlAI, joins us to explore how teams can reliably operate and improve complex LLM systems and agents in production. Scott introduces a Maslow’s hierarchy of observability: telemetry for logging, monitoring for known signals, and post-production or online analytics to surface unknown unknowns. We dig into examples of real-world failures Scott’s team has seen in production systems, such as “lazy” tool-use hallucinations that standard evals miss, and how mapping traces into vector fingerprints enables clustering and topic discovery to uncover emergent behaviors. Scott explains how analytics can feed the data flywheel by generating evals, guardrails, and training data, and why online, adaptive approaches are essential for non-stationary models. We also touch on practical how-to’s such as instrumentation with OpenTelemetry, the GenAI semantic conventions, and the role of dedicated analytics tools. 🗒️ For the full list of resources for this episode, visit the show notes page: twimlai.com/go/767. 📖 CHAPTERS =============================== 00:00 - Introduction 01:32 - What is Distributional? 03:54 - Bayesian statistics and optimization in multiagents 08:14 - Anti-patterns 10:11 - Hierarchy of observability 16:12 - Applying analytics in the lifecycle 21:58 - Trace clustering and vector mapping 26:42 - Evals 31:04 - OpenTelemetry (OTEL) and the Gen AI semantic convention 35:47 - Non-stationarity and “model weather” reports 41:30 - Examples of distribution shifts 46:24 - Distributional is open distribution 47:05 - Metrics for applying analytics 48:54 - Academic benchmark 51:07 - Future directions
1
2
183
After working with AI teams across the industry, we’ve identified four distinct stages of observability maturity. Here’s what each one looks like — and where the transitions break down: dbnl.io/rwXKqV

18
We're on @twimlai with @samcharrington! Catch Scott Clark as he share how teams can reliably operate and improve complex LLM systems in production and discusses the Maslow’s hierarchy of observability for your AI agents: youtube.com/watch?v=ZqehXrVl…
1
20
Distributional retweeted
In this episode, @DrScottClark, co-founder and CEO of @dbnlAI, joins us to explore how teams can reliably operate and improve complex LLM systems and agents in production. Scott introduces a Maslow’s hierarchy of observability: telemetry for logging, monitoring for known signals, and post-production or online analytics to surface unknown unknowns. We dig into examples of real-world failures Scott’s team has seen in production systems, such as “lazy” tool-use hallucinations that standard evals miss, and how mapping traces into vector fingerprints enables clustering and topic discovery to uncover emergent behaviors. Scott explains how analytics can feed the data flywheel by generating evals, guardrails, and training data, and why online, adaptive approaches are essential for non-stationary models. We also touch on practical how-to’s such as instrumentation with OpenTelemetry, the GenAI semantic conventions, and the role of dedicated analytics tools. 🗒️ For the full list of resources for this episode, visit the show notes page: twimlai.com/go/767. 📖 CHAPTERS =============================== 00:00 - Introduction 01:32 - What is Distributional? 03:54 - Bayesian statistics and optimization in multiagents 08:14 - Anti-patterns 10:11 - Hierarchy of observability 16:12 - Applying analytics in the lifecycle 21:58 - Trace clustering and vector mapping 26:42 - Evals 31:04 - OpenTelemetry (OTEL) and the Gen AI semantic convention 35:47 - Non-stationarity and “model weather” reports 41:30 - Examples of distribution shifts 46:24 - Distributional is open distribution 47:05 - Metrics for applying analytics 48:54 - Academic benchmark 51:07 - Future directions
1
2
7
730
Understand how your AI agent is performing in the context of user intent, marrying intent to outcomes for these agents: dbnl.io/uYWVjB
10