Building high-quality AI systems is hard.
At Langfuse we see the best AI teams converging on a process to get complex AI systems to production.
We call it the AI Engineering Loop.
Check out the first piece of our series and find out more in our academy
Langfuse x @NousResearch Hermes Agent integration is now available.
Hermes ships with a bundled Langfuse plugin, every conversation turn, LLM call, and tool use is traced automatically.
langfuse.com/integrations/ot…
you can technically automate the entire AI engineering loop now
while it’s tempting, here’s what we think you should keep in your own hands to set your agent apart
Now live: Langfuse x @openai Codex integration.
Trace every prompt, tool call, shell command, and token across your Codex sessions. Also available for Claude Code.
langfuse.com/integrations/ot…
had a lot of fun at the AI Engineer conference to go deep on:
(1) how we think about the role of skills
(2) how to develop/eval/improve them
(3) lessons from building our own set of skills
Skill issue: Lessons from skilling up coding agents
Getting agents to actually use Langfuse was a "skill issue" — literally. Marc Klingen from Clickhouse on teaching coding agents to use new tools, and why it's harder than you think.
youtube.com/watch?v=vNCY9kXX…
quarterly Langfuse Town Hall on June 11th
catch up on everything we've shipped: v4, the latest releases, and what's coming next on the roadmap. Q&A with the team at the end.
open to the whole community. register: luma.com/7dny2x72
day 5 of launch week: langfuse MCP.
supports: observations, metrics, scores, datasets, comments, annotation queues, models, media, and more.
claude or linear agents can pull a trace, drop a comment, or create dataset items without leaving the chat.
langfuse.com/launch
day 4 of langfuse launch week: code evaluators.
write a python or typescript `evaluate` function in the langfuse UI. attach it to live observations or an experiment. scores land natively next to your existing ones.
@wochinge demos below; langfuse.com/launch
day 3 of langfuse launch week: full-text search.
multi-GB scans drop from many seconds to sub-second on @ClickHouseDB's new text indexes. great work from @sum3rman.
available via UI and API.
more: langfuse.com/launch
day 2 of langfuse launch week 5: langfuse agent skill.
bringing an agent to production is hard.
using the skill you can ask your coding agent to instrument your app, calibrate a judge, or set up evaluators.
@marliessophie demos below; langfuse.com/launch
day 1 of langfuse launch week 5: a github action that runs your langfuse experiments on every PR.
fails the workflow when scores drop below your threshold. posts pass/fail to the PR. every run is tracked in langfuse.
langfuse.com/launch
Want to see what Claude Code is actually doing? We made a video showing exactly how to observe it in real-time with Langfuse.
Claude Code in Action: Trace Tool Calls & Decisions with Langfuse youtu.be/fsoBHf_WNmQ?si=0pp5…
This is a great article by @annabellschfr - a lot of teams still get stucks on vibes and don't make it to actually systematically experiment with models, prompts, . context, architectures. dig in!