Ian Cairns

Ian Cairns

923 Photos and videos

Tweets

Ian Cairns

@cairns

Jun 10

Late to the party but this is a good Tweet. If you're building agents, @Vtrivedy10 is describing the roadmap for your ops. 👇

Viv

@Vtrivedy10

Jun 7

imo there’s a pretty solid default recipe that everyone should use to optimize a system of Agent = Model Harness you should “train” both 1. Build v1 agent using a sensible base harness and some task specific prompting tools 2. Harness Engineering using eval tasks that roughly match prod this is often enough - most companies can get acceptable perf doing this. then they collect traces, mine them for patterns, and make slight tweaks from there 3. SFT using data collected from traces) or synthetic data. Often is good candidate for “distillation tasks” to train a cheaper model while maintaining existing performance 4. RL if you have the bandwidth and ability and desire to create environments and designing rewards that represents the tasks you want your agent to be good at. Push past the SFT behavior of “copying” data from existing model to pushing past in some dimension 5. Light harness engineering again to squeeze any more juice (ex: slight prompting) using the trained model that’s better at your task distribution this loop will largely be productized as a general purpose recipe for building and improving agents we’re still in the earliest innings of the world’s companies getting comfortable with steps 1-2 of this loop. Harness engineering will probably be the dominant way ppl will optimize agents but i expect a large number of companies to onboard through this entire loop on some trial project of interest in the next year

2,476

LangChain

Ian Cairns retweeted

LangChain

@LangChain

May 6

A great conversation between Noah Kravitz from the @nvidia team @hwchase17.

NVIDIA AI

@NVIDIAAI

May 6

“Every enterprise needs a claw strategy.” How did @LangChain go from a weekend project to 1B downloads in 3 years? We sat down with CEO and co-founder Harrison Chase (@hwchase17) to talk deep agents, evolving agent architectures, and what’s coming next. 🎧 Full episode: nvda.ws/4d8QVQ5

0:18

5,966

Ian Cairns

Ian Cairns

@cairns

Feb 6

🎙️New Deployed episode with Kevin Stanton from @SproutSocial. They're building agents that process billions of social messages and turn it all into signal. He shares great lessons learned as an engineering leader. One example, the benefits of chat UX: it's the fastest way to "seed your evals" with real traces, and a product manager's holy grail to learn what customers actually want to do.

0:57

486

Ian Cairns

Ian Cairns

@cairns

Feb 6

Full episode covers a lot more: * Why MCP felt more natural than RAG for their system of record * "LLMs are the most expensive switch statement on the planet" (when to skip MCP tools and use code) * Why they pulled evals out of CI/CD Kevin's a Distinguished Engineer at Sprout and has been there for 13 years building infra. This is a good listen for other people making the transition to building agents. Full episodes: * Spotify: open.spotify.com/episode/06S… * YouTube: youtube.com/watch?v=3PTAGZg4…

What It Takes to Run Agents on Billions of Messages: Kevin Stanton, Sprout Social

Deployed: The AI Product Podcast · Episode

open.spotify.com

151

Ian Cairns

Ian Cairns

@cairns

Jan 27

Cisco's @duosec built their AI evals & quality practice without a blueprint. A year later, they're watching the industry catch up to what they figured out by doing the work: automated evals > cross-functional data review > improve > repeat. Proud to support their team. 🙌

379

Ian Cairns

Ian Cairns

@cairns

Jan 21

Our team's been shipping a ton this year! This one's incredibly practical: Build automations to move data around your system so you don't have to do it by hand. * Create review queues * Refresh datasets * Run conditional evals * Send Slack alerts * ...more to come!

This tweet is unavailable

290

Ian Cairns

Ian Cairns

@cairns

Jan 14

New case study: How @Chime scales AI in production by letting domain experts own evals and prompt performance alongside engineering. If you're figuring out how to formalize AI ops across your team, this is a good blueprint.

385

Ian Cairns

Ian Cairns

@cairns

Jan 14

In the last year, lots of teams have been trying to get PMs and domain experts more involved in AI product development and evals. Folks like @HamelHusain and @sh_reya have evangelized how important this is. Chime has figured it out, read on. 👇 freeplay.ai/blog/chime�…

302

Ian Cairns

Ian Cairns

@cairns

Jan 9

This evals flashcard is all you need. 👇

Hamel Husain

@HamelHusain

Jan 7

This deserved its own flashcard b/c I've seen bus stop ads from eval vendors encouraging the opposite in San Francisco 🤣 The only thing generic metrics do is waste your time. Links in reply.

200

Ian Cairns

Ian Cairns

@cairns

Jan 8

It’s amazing how often people sound grumpy in Slack or email and turn out to be totally fine. Assume positive intent.

Ian Cairns

Ian Cairns

@cairns

Jan 5

I updated our new @mintlify docs site using Cursor Claude over the break. It was the best software experience I've had in a long time. Coding agents aren't just for code. Every CMS should work like this.

445

Ian Cairns

Ian Cairns

@cairns

Jan 5

If you've never used it: * All your docs are .mdx files * Everything deploys automatically with every commit * If you have an OpenAPI spec there's extra magic for your API docs (example: docs.freeplay.ai/openapi/int…)

Introduction - Freeplay Introduction

Observe, evaluate, and iterate toward great AI applications

docs.freeplay.ai

191

Ian Cairns

Ian Cairns

@cairns

Jan 5

It was 3 years ago today that @ericwryan and I started showing up full time to a real office to build Freeplay. 🥳 I'm all for remote and know it can be great, but I can't shake the feeling that moving to IRL was foundational for bringing our company to life. Since then we've had probably dozens of video call conversations that sound something like this: Big company folks: "Wait, you all are in the same building in real life? We're jealous… We [gave up our office / no one comes in anymore].” Us: "Yep. We did remote for years and know it can work, but it's felt great to be together in person." Here's what's worked for us: ⏱️ Synchronous comms. Things move fast in startup life. Need to solve a problem? Tap someone on the shoulder. Pair on code, or go for a walk along Boulder Creek and talk through it. ✨ Serendipity. So many good ideas have come from discussion outside official channels like Slack or GitHub, where people just stumble onto a good topic together. NGL, the side chats / coffee chats / lunch chats / etc. matter. 🤝 Trust. Being together in person has helped us get to know each other. I keep hearing from people in remote teams who struggle to figure this out. 😎 Vibes. We walk out into the middle of downtown Boulder, have a space that belongs to us, friends from other companies stop by… Kinda hard to explain other than “it feels good.” And I’d argue that’s ok. What might surprise people too: We’re very open to remote hires, and we have designated WFH days. We’re not religious about being in office, but we’ve been intentional about defining our team culture. * For remote folks, we tell them up front we’re a synchronous culture and they pair as much as anyone else. We also have an open video call all day with a view to our office so folks can drop in. And we fly everyone to Boulder every six weeks to retro, plan, and spend time together. (3rd annual ski trip is in February!) * And for anyone who lives locally but prefers to WFH sometimes, everyone has full flexibility Wednesday and Thursday. But we commit to start and end the week together in person. It creates a strong rhythm. The balance has allowed us to hire some great people who don’t live close by, but the clarity on our approach to in-office vs. WFH has helped too. Remote folks know how we work up front, and they opt-in (which means they generally like it too). That’s what’s worked well for us so far… Curious to hear what you think, and what else has worked well in your context. PS: If this sounds interesting let’s talk, link is in the comments. 🙌

516

Ian Cairns

Ian Cairns

@cairns

19 Dec 2025

It's windy here today.

Mitchell Byars @mitchellbyars

19 Dec 2025

Well that escalated quickly: NCAR's Mesa lab in the #Boulder foothills has already recorded gusts over 100 mph today #cowx

105

Ian Cairns

Ian Cairns

@cairns

18 Dec 2025

🚀 New at @freeplay_ai: Review Insights An agent that automatically clusters themes as you review data, then suggests actions like eval metric creation or automated prompt experiments. Check it out.

3:26

2,746

more replies

Ian Cairns

Ian Cairns

@cairns

18 Dec 2025

The result: Faster root-cause analysis, tighter iteration loops, and a stronger data flywheel for improving AI agents. Big shoutout to @HamelHusain for the push to get everyone looking at data, and to @shreyashankar — her EvalGen paper inspired this direction over a year ago, and it keeps getting better. 🙌

103

Ian Cairns

Ian Cairns

@cairns

18 Dec 2025

Read more on our blog here. freeplay.ai/blog/introducing…

Ian Cairns

Ian Cairns

@cairns

20 Nov 2025

The best flag at @conviction

294