VP, Observability-Data@datadoghq | peripatetic | minimalist { engineer | athlete | artist } | I have opinions-of-my-own

Joined December 2017
9 Photos and videos
Exclusivity keeps a thing expensive, inclusivity makes the same thing affordable and therefore ubiquitous.
1
12
Verification is definitely the bottleneck. We need more startups working on this problem to scale accountability with generation.
We're launching @niteshiftdev – the full-stack cloud for coding agents Verification is the new bottleneck. Software teams can now define their dev environment and verification tools once. Then run any frontier agent in the cloud: Claude Code, Codex, or OpenCode
1
2
75
/sesh/null retweeted
Now this is how you do a CEO panel 📸 @nerdsane (VP of AI @ @datadoghq) and Clémence J Burnichon (Sr Director Eng @ Datadog) took the stage with some of the best CEO’s in tech right now: @zachlloydtweets - @warpdotdev @dakshgup - @greptile @jayair - @opencode
3
15
532
/sesh/null retweeted
Developers developers developers @dakshgup @jayair @nerdsane @zachlloydtweets
2
7
241
/sesh/null retweeted
crazy line out the door for @jayair (@opencode), @zachlloydtweets (@warpdotdev), and @dakshgup’s (@greptile) session at @datadoghq DASH on how coding agents is changing the SDLC
2
4
16
1,934
/sesh/null retweeted
TOMORROW - we're hosting our @Techweek_ by @a16z AI Rooftop event with @datadoghq x @vercel ✨ Speakers include: Director of Eng/AI - @diamondbishop VP, Observability and AI - @nerdsane Sr. Director, Eng - Andrey Sibirev (Vercel) Moderator: @MadsMcIlwain (Vercel) See if you can still snag a spot: partiful.com/e/NHLbgkXd64ICe… @vercel_dev
2
5
235
Below is some serious work from the Datadog team and I’m impressed the magnitude they were able to concieve and achieve in the timeframe of a hackathon (few hours, single day). Also super happy to see our collective vision of Directed Software Evolution through our research projects like BitsEvolve and Temper showcased, with a clear demonstration of the importance of production observability as a feedback loop to achieve that. Looking forward to the detailed write up.
Participated in the Autoresearch systems hackathon in SF, hosted by Modal, OpenAI, Raindrop and Antler, along with Jai Menon and Pranav Garg. Our hypothesis was that by using Temper's governance and verification layers, and building tools on top of Temper, we could produce (1/8)
3
125
I know there are some efforts to write more precise specifications in prose with llms, I think we can do better by making more of those specifications mathematically precise and observable. In other words, can the specification become part of the system (mechanically executable), not just an input to the LLM? If so, then those pieces would become observable artifacts. In that case now the LLM produces a formal, observable specification instead of only prose. The developer can audit or even edit that spec. Model check for consequences independently (than just models doing it). Helps Develop an operational mental model that we are losing with being distant with code generation. The spec can map more directly to runtime code. With Observability like @datadoghq still instrumenting the running system, it feeds production behavior back to the LLM and connect to the specs. So now, when something fails, the failure can trace back to the spec. I’m calling this paradigm “Higher Order Construction” with coding agents.
1
2
89
/sesh/null retweeted
We’ve released a technical report for Toto 2.0 detailing the data, architecture, training recipe, μP/u-μP hyperparameter transfer pipeline, and benchmark results behind our 5-model open-weight release. Report linked below.
Today we’re releasing Toto 2.0: a family of open-weights time series foundation models spanning 4M to 2.5B parameters. The question we set out to answer was simple (yet previously open): Do time series foundation models get reliably better as they scale? Our answer: yes! 🧵
1
10
57
5,789
/sesh/null retweeted
NEW from Datadog: it's Lapdog! Ever wondered what your AI agent was actually doing? Our latest free project runs locally and traces reasoning and tool calls in Codex, Claude Code, and Pi. You can now see what your agent is REALLY doing, live: lapdog.datadoghq.com/
38
51
699
266,285
/sesh/null retweeted
Scaling finally works for Time Series Foundation Models. Introducing Toto 2.0: open-weights TSFMs from 4M to 2.5B params, where every size beats the last from a single hyperparameter config. #1 on leading benchmarks: BOOM, GIFT-Eval, and TIME. Most TSFM families ship multiple sizes that all perform roughly the same. This one doesn't.
1
9
18
3,278
The load-bearing frequency of ‘load-bearing’ in LLM discussions is becoming structurally load-bearing on my sanity
1
5
72
/sesh/null retweeted
“At Datadog, over the last four months, nearly 90% of engineers used coding agents for production work." - VP Observability Data, @nerdsane (@datadoghq) Our very own Sesh spoke at Code w/ @claudeai last night covering the instances in which the eng teams at Datadog are utilizing agents for production work. #codewithclaude #claude #claudecode @ClaudeDevs
2
11
107,717
/sesh/null retweeted
Apr 19
@AnthropicAI Claude Design is so fun! This release was so serendipitous because I just set up Katagami - a living design language library sourced and synthesized by agents based on rough ideas I wanna explore. You can download a spec from Katagami, upload it into Claude Design as a design system and start applying it to your project from there. I just tried it and it worked amazingly well. Can’t wait to use this more in my future projects.
2
2
315
Time for the universal machine tool for the software industrialization, that rebuilds from the SaaS-pocalypse .
we are entering the tool calling industrial revolution because of code mode
3
156
/sesh/null retweeted
we are entering the tool calling industrial revolution because of code mode
6
5
105
7,760
/sesh/null retweeted
Are chatbots in SaaS apps dead? Chat is communication method, not a product. You can’t define “AI” or “bots” as chat. SaaS companies should think of shipping AI in two categories: 1. Autonomous: AI as a separate entity from the human 2. Assistant: AI as an extension of the human Autonomy: these are essentially background agents that go in loops. You can think of them as doing stuff recursively, kicking off on set triggers or (ideally) events it detects itself. The holy grail here is a background agents that can wake itself up to things you care about, make evaluations and drive its own loop for a long time with proper and only necessary context, execute, iterate, and ask for your input/notify you when it’s done. Key here is that the agent owns its own loop. Claws work really well here to help orchestrate and coordinate for subtasks with personality. Assistants: these are multi turn agents, that start reactively and triggers are defined at each turn. They tend to execute much more scoped tasks, but can still go off and explore and move recursively within a defined upfront instruction input. You play fetch with your assistant. The goal of autonomy is catch things you wouldn’t have caught, to be always-on, and to act as an independent colleague. The goal of assistants is to be your superpower, to help you run your defined workflows, and to execute on your commands. The easiest mode of communication for both is chat. Artifacts are helpful to digest both loops and turns. Our Assistant (Bits) is in Preview. And our next evolution of Autonomy is coming very soon…
We’ve launched Bits Assistant to help customers search and act across Datadog to resolve issues faster. Few examples below on how we see customers use it.
1
1
4
434
/sesh/null retweeted

5
22
7,324
/sesh/null retweeted
The first thing I did at @tryramp was set up distributed tracing, structured logging, and metrics for Inspect, our background coding agent. We now have full visibility in to everything the system is doing: the browser, CF workers/DOs, @modal sandboxes, database calls, etc. Most importantly, Inspect now has visibility in to itself. It can self-triage runtime errors it encounters and create PRs to fix them. Every morning, it reviews the past 24 hours of its own @datadoghq dashboard, identifies systemic issues, new errors, and long tail latencies, and has a summary PR waiting for me at 9am.
30
26
522
72,023
/sesh/null retweeted
Really enjoyed this conversation with @swyx. Hope you enjoy the podcast.
⚡️Monty: the ultrafast Python interpreter, by Agents, for Agents, youtu.be/nxnQl4AcqFg so glad to catch up with @samuelcolvin of @pydantic !
3
20
3,052