In this episode,
@DrScottClark, co-founder and CEO of
@dbnlAI, joins us to explore how teams can reliably operate and improve complex LLM systems and agents in production. Scott introduces a Maslow’s hierarchy of observability: telemetry for logging, monitoring for known signals, and post-production or online analytics to surface unknown unknowns. We dig into examples of real-world failures Scott’s team has seen in production systems, such as “lazy” tool-use hallucinations that standard evals miss, and how mapping traces into vector fingerprints enables clustering and topic discovery to uncover emergent behaviors. Scott explains how analytics can feed the data flywheel by generating evals, guardrails, and training data, and why online, adaptive approaches are essential for non-stationary models. We also touch on practical how-to’s such as instrumentation with OpenTelemetry, the GenAI semantic conventions, and the role of dedicated analytics tools.
🗒️ For the full list of resources for this episode, visit the show notes page:
twimlai.com/go/767.
📖 CHAPTERS
===============================
00:00 - Introduction
01:32 - What is Distributional?
03:54 - Bayesian statistics and optimization in multiagents
08:14 - Anti-patterns
10:11 - Hierarchy of observability
16:12 - Applying analytics in the lifecycle
21:58 - Trace clustering and vector mapping
26:42 - Evals
31:04 - OpenTelemetry (OTEL) and the Gen AI semantic convention
35:47 - Non-stationarity and “model weather” reports
41:30 - Examples of distribution shifts
46:24 - Distributional is open distribution
47:05 - Metrics for applying analytics
48:54 - Academic benchmark
51:07 - Future directions