Agents @NVIDIA. Past work: Co-founder @ Freeplay / Product & Design @Firstbase / Product lead @ TwitterAPI / OG @ Gnip @developmentseed. Grateful.

Joined March 2007
923 Photos and videos
Late to the party but this is a good Tweet. If you're building agents, @Vtrivedy10 is describing the roadmap for your ops. šŸ‘‡
imo there’s a pretty solid default recipe that everyone should use to optimize a system of Agent = Model Harness you should ā€œtrainā€ both 1. Build v1 agent using a sensible base harness and some task specific prompting tools 2. Harness Engineering using eval tasks that roughly match prod this is often enough - most companies can get acceptable perf doing this. then they collect traces, mine them for patterns, and make slight tweaks from there 3. SFT using data collected from traces) or synthetic data. Often is good candidate for ā€œdistillation tasksā€ to train a cheaper model while maintaining existing performance 4. RL if you have the bandwidth and ability and desire to create environments and designing rewards that represents the tasks you want your agent to be good at. Push past the SFT behavior of ā€œcopyingā€ data from existing model to pushing past in some dimension 5. Light harness engineering again to squeeze any more juice (ex: slight prompting) using the trained model that’s better at your task distribution this loop will largely be productized as a general purpose recipe for building and improving agents we’re still in the earliest innings of the world’s companies getting comfortable with steps 1-2 of this loop. Harness engineering will probably be the dominant way ppl will optimize agents but i expect a large number of companies to onboard through this entire loop on some trial project of interest in the next year
2
1
6
2,476
Ian Cairns retweeted
A great conversation between Noah Kravitz from the @nvidia team @hwchase17.
ā€œEvery enterprise needs a claw strategy.ā€ How did @LangChain go from a weekend project to 1B downloads in 3 years? We sat down with CEO and co-founder Harrison Chase (@hwchase17) to talk deep agents, evolving agent architectures, and what’s coming next. šŸŽ§ Full episode: nvda.ws/4d8QVQ5
1
7
32
5,966
šŸŽ™ļøNew Deployed episode with Kevin Stanton from @SproutSocial. They're building agents that process billions of social messages and turn it all into signal. He shares great lessons learned as an engineering leader. One example, the benefits of chat UX: it's the fastest way to "seed your evals" with real traces, and a product manager's holy grail to learn what customers actually want to do.
1
486
Full episode covers a lot more: * Why MCP felt more natural than RAG for their system of record * "LLMs are the most expensive switch statement on the planet" (when to skip MCP tools and use code) * Why they pulled evals out of CI/CD Kevin's a Distinguished Engineer at Sprout and has been there for 13 years building infra. This is a good listen for other people making the transition to building agents. Full episodes: * Spotify: open.spotify.com/episode/06S… * YouTube: youtube.com/watch?v=3PTAGZg4…
151
Cisco's @duosec built their AI evals & quality practice without a blueprint. A year later, they're watching the industry catch up to what they figured out by doing the work: automated evals > cross-functional data review > improve > repeat. Proud to support their team. šŸ™Œ
5
2
3
379
Our team's been shipping a ton this year! This one's incredibly practical: Build automations to move data around your system so you don't have to do it by hand. * Create review queues * Refresh datasets * Run conditional evals * Send Slack alerts * ...more to come!
1
290
New case study: How @Chime scales AI in production by letting domain experts own evals and prompt performance alongside engineering. If you're figuring out how to formalize AI ops across your team, this is a good blueprint.
1
1
5
385
In the last year, lots of teams have been trying to get PMs and domain experts more involved in AI product development and evals. Folks like @HamelHusain and @sh_reya have evangelized how important this is. Chime has figured it out, read on. šŸ‘‡ freeplay.ai/blog/chimeā€ā€¦

1
302
This evals flashcard is all you need. šŸ‘‡
This deserved its own flashcard b/c I've seen bus stop ads from eval vendors encouraging the opposite in San Francisco 🤣 The only thing generic metrics do is waste your time. Links in reply.
2
200
It’s amazing how often people sound grumpy in Slack or email and turn out to be totally fine. Assume positive intent.
95
I updated our new @mintlify docs site using Cursor Claude over the break. It was the best software experience I've had in a long time. Coding agents aren't just for code. Every CMS should work like this.
3
11
445
If you've never used it: * All your docs are .mdx files * Everything deploys automatically with every commit * If you have an OpenAPI spec there's extra magic for your API docs (example: docs.freeplay.ai/openapi/int…)
2
191
It was 3 years ago today that @ericwryan and I started showing up full time to a real office to build Freeplay. 🄳 I'm all for remote and know it can be great, but I can't shake the feeling that moving to IRL was foundational for bringing our company to life. Since then we've had probably dozens of video call conversations that sound something like this: Big company folks: "Wait, you all are in the same building in real life? We're jealous… We [gave up our office / no one comes in anymore].ā€ Us: "Yep. We did remote for years and know it can work, but it's felt great to be together in person." Here's what's worked for us: ā±ļø Synchronous comms. Things move fast in startup life. Need to solve a problem? Tap someone on the shoulder. Pair on code, or go for a walk along Boulder Creek and talk through it. ✨ Serendipity. So many good ideas have come from discussion outside official channels like Slack or GitHub, where people just stumble onto a good topic together. NGL, the side chats / coffee chats / lunch chats / etc. matter. šŸ¤ Trust. Being together in person has helped us get to know each other. I keep hearing from people in remote teams who struggle to figure this out. šŸ˜Ž Vibes. We walk out into the middle of downtown Boulder, have a space that belongs to us, friends from other companies stop by… Kinda hard to explain other than ā€œit feels good.ā€ And I’d argue that’s ok. What might surprise people too: We’re very open to remote hires, and we have designated WFH days. We’re not religious about being in office, but we’ve been intentional about defining our team culture. * For remote folks, we tell them up front we’re a synchronous culture and they pair as much as anyone else. We also have an open video call all day with a view to our office so folks can drop in. And we fly everyone to Boulder every six weeks to retro, plan, and spend time together. (3rd annual ski trip is in February!) * And for anyone who lives locally but prefers to WFH sometimes, everyone has full flexibility Wednesday and Thursday. But we commit to start and end the week together in person. It creates a strong rhythm. The balance has allowed us to hire some great people who don’t live close by, but the clarity on our approach to in-office vs. WFH has helped too. Remote folks know how we work up front, and they opt-in (which means they generally like it too). That’s what’s worked well for us so far… Curious to hear what you think, and what else has worked well in your context. PS: If this sounds interesting let’s talk, link is in the comments. šŸ™Œ
7
516
19 Dec 2025
It's windy here today.
Well that escalated quickly: NCAR's Mesa lab in the #Boulder foothills has already recorded gusts over 100 mph today #cowx
105
18 Dec 2025
šŸš€ New at @freeplay_ai: Review Insights An agent that automatically clusters themes as you review data, then suggests actions like eval metric creation or automated prompt experiments. Check it out.
2
1
4
2,746
18 Dec 2025
The result: Faster root-cause analysis, tighter iteration loops, and a stronger data flywheel for improving AI agents. Big shoutout to @HamelHusain for the push to get everyone looking at data, and to @shreyashankar — her EvalGen paper inspired this direction over a year ago, and it keeps getting better. šŸ™Œ
1
1
103
18 Dec 2025
Read more on our blog here. freeplay.ai/blog/introducing…

1
1
98
20 Nov 2025
The best flag at @conviction
1
5
294