The open source developer platform to build AI applications and models with confidence.

Joined August 2018
523 Photos and videos
Jun 9
Genie โ†’ MLflow traces ๐Ÿ‘‡ ๐Ÿ”น Ingest space conversations ๐Ÿ”น One trace per conversation ๐Ÿ”น Base layer for judges/QA ๐Ÿ“• Read the cookbook: mlflow.org/cookbook/genie-trโ€ฆ #MLflow #Genie
1
5
351
Jun 8
Production trust gap: LLMs can leak PII, produce harmful content, or violate content policies. ๐Ÿ‘‡ ๐Ÿ”น App filters don't scale across agent LLM calls ๐Ÿ”น One bug in app code can skip them ๐Ÿ”น AI Gateway = centralized, consistent guards Full video: youtube.com/live/A1aNFvApZv8โ€ฆ #AIGateway #LLMOps
1
1
6
379
Jun 4
MLflow 3.12 deep dive clip: why coding agents need tracing ๐Ÿ‘‡ Yuki Watanabe walks through what shows up in the trace when you turn it on: ๐Ÿ”น Every turn, tool call (Read, Bash, Edit), and sub-agent step ๐Ÿ”น Token usage and latency per span, including cache breakdown
1
1
2
515
Jun 4
๐Ÿ”น Full sessions grouped together, so long conversations stay debuggable ๐Ÿ”น MLflow 3.12 tracing for Claude Code, Codex, Gemini CLI, OpenCode, Qwen Code, and OpenHands ๐ŸŽฅ Full webinar: youtube.com/live/A1aNFvApZv8โ€ฆ #MLflow #CodingAgents
653
Jun 4
Genie fixes from failed evals ๐Ÿ‘‡ ๐Ÿ”น Traces space config in ๐Ÿ”น LLM suggests concrete edits ๐Ÿ”น Shorten signal-to-patch time ๐Ÿ“• Read the cookbook: mlflow.org/cookbook/genie-spโ€ฆ #MLflow #GenAI #Genie
1
8
445
Jun 3
MLflow 3.13.0: RBAC Admin UI for self-hosted servers ๐Ÿ‘‡ ๐Ÿ” Roles as reusable permission bundles ๐Ÿ–ฅ๏ธ Admin UI (no REST endpoints) ๐Ÿ“ฆ Experiments, models, prompts, scorers, Gateway Release highlights: mlflow.org/releases/3.13.0/ #MLflow #MLOps
1
9
518
Jun 3
MLflow 3.13.0 is a major update that runs AI observability at scale, focusing on access control, the lifecycle of your trace data, and richer support for agents. ๐Ÿ™Œ ๐Ÿ”—Check out the highlights of the release: mlflow.org/releases/3.13.0/ #mlflow #opensource #linuxfoundation
4
11
672
Jun 2
LLM judges for Genie traces ๐Ÿ‘‡ ๐Ÿ”น Built-in baseline judges ๐Ÿ”น Custom SQL/semantics checks ๐Ÿ”น Start on highest-risk traces ๐Ÿ“• Read the cookbook: mlflow.org/cookbook/genie-evโ€ฆ #MLflow #Genie
6
632
Jun 1
Thousands of traces, no systematic way to spot bad agent runs. MLflow Automatic Issue Detection ๐Ÿ‘‰ choose CLEARS categories, run analysis in three clicks, triage issues in the UI. ๐Ÿ”— Learn more: mlflow.org/blog/issue-detectโ€ฆ #MLflow #LLMOps #GenAI
1
5
328
May 28
Trace eval Genie in MLflow ๐Ÿ‘‡ ๐Ÿ”น Full Genie pipeline ๐Ÿ”น MLflow traces judges ๐Ÿ”น Tighten one pilot space first ๐Ÿ“• Read the cookbook: mlflow.org/cookbook/databricโ€ฆ #MLflow #Genie
5
366
May 27
Vibe-checking works until it doesn't. Change one prompt, break three behaviorsโ€”and you can't tell if you moved forward or backward. Eval-driven development in MLflow ๐Ÿ‘‡ 1๏ธโƒฃ Trace โ€” mlflow.openai.autolog() @mlflow.trace spans (latency, tokens, cost) 2๏ธโƒฃ Evaluate prompts โ€” mlflow.genai.evaluate(), make_judge(), Prompt Registry, optimize_prompts (GEPA) 3๏ธโƒฃ Prod โ€” same judges on live traces; agent dashboards for cost/latency/quality ๐Ÿ”— Learn more: mlflow.org/blog/structured-aโ€ฆ #MLflow #LLMOps #GenAI
1
1
7
514
May 26
Right answer, wrong trace? MLflow TruLens Agent GPA scorers read the full span tree ๐Ÿ‘‡ ๐Ÿ”น 10 TruLens scorers: 6 Agent GPA 4 RAG ๐Ÿ”น 95% agent errors on TRAIL vs 55% ๐Ÿ”น mlflow.genai.evaluate() w/ RAG Phoenix ๐Ÿ”— Read more: mlflow.org/blog/mlflow-truleโ€ฆ #MLflow #TruLens #GenAI
3
360
May 26
Red-team LLM apps in MLflow ๐Ÿ‘‡ ๐Ÿ”น Adversarial eval inputs ๐Ÿ”น Safety scorers guidelines ๐Ÿ”น Rerun after model/prompt changes ๐Ÿ“• Read the cookbook: mlflow.org/cookbook/red-teamโ€ฆ #MLflow #GenAI
1
14
699
May 22
Claude Code can burn through dozens or hundreds of LLM calls in one session. MLflow 3.12.0 : route it through AI Gateway with two env vars for traces, budget alerts/limits, and guardrails. No SDK changes. ๐Ÿ›ฃ๏ธ Setup: mlflow server โ†’ Gateway endpoint โ†’ ANTHROPIC_BASE_URL to the claude-code proxy. Run claude as usual. Learn more ๐Ÿ‘‰ mlflow.org/blog/gateway-clauโ€ฆ #MLflow #AIGateway #ClaudeCode
1
5
22
2,088
May 22
RAG eval end-to-end in MLflow ๐Ÿ‘‡ ๐Ÿ”น Trace retrieve generate ๐Ÿ”น Built-in retrieval/gen judges ๐Ÿ”น Localize failure to a stage ๐Ÿ“• Read the cookbook: mlflow.org/cookbook/rag-evalโ€ฆ #MLflow #RAG
4
18
918
May 21
Catch this session at Data AI Summit (June 15-18, SF)! ๐ŸŒŸ Agent quality via vibe-checking breaks at scale. ๐Ÿ” MLflow self-evolving test harness ๐Ÿงช Bad-answer feedback โ†’ automated tests โœ… Coding-agent fixes vs. accumulated suite ๐ŸŽค Adam Gurary & Yuki Watanabe Session details: databricks.com/dataaisummit/โ€ฆ #MLflow #DataAISummit
2
258
May 21
Prompt lifecycle in MLflow ๐Ÿ‘‡ ๐Ÿ”น Registry-backed versions ๐Ÿ”น Eval-gated promotion ๐Ÿ”น Rollbacks without guesswork ๐Ÿ“• Read the cookbook: mlflow.org/cookbook/prompt-eโ€ฆ #MLflow #GenAI
4
278
May 20
.@OpenHandsDev agents edit files, run commands, and browse the web on their ownโ€”but thereโ€™s no structured record of what happened or whether the result was good. MLflow connects via @opentelemetry to trace every step, evaluate runs with built-in judges, and route model traffic through AI Gateway for budget and usage control. Learn more ๐Ÿ‘‰ mlflow.org/blog/mlflow-openhโ€ฆ #MLflow #OpenHands
1
2
213
MLflow retweeted
May 19
New on the MLflow channel: evaluate a RAG agent end-to-end with Joana Mesquita, MLflow Ambassador ๐Ÿ‘‡ ๐Ÿ“Œ Prompt Registry production aliases ๐Ÿ” Traces with SME ground truth โš–๏ธ Ragas, Phoenix custom LLM judge Watch now: youtu.be/4wqkHroNGFQ Blog: medium.com/@joana.c.mesquitaโ€ฆ #MLflow #RAG
1
1
12
541