i like making things | Applied AI @LangChain

Joined April 2017
84 Photos and videos
I no longer fear death, for I am immortalized in the fable weights- an honor worth more than my mere mortal existence
2
5
238
Ain’t it just beautiful
Orange and Blue from the skies #NewYorkKnicks
1
5
233
I'm bullish on agent swarms (aka workflows). Agents are increasingly being used to analyze and collate massive amounts of unstructured data in repetitive ways (e.g. document extraction, reading emails, parsing logs), but as these tasks and data inputs scale we've seen reliable execution decrease, even from the most capable models. Specifically, the consistency of sub agent dispatches from filesystem-based agents drops dramatically when attempting to deploy more than 30 sub agents in parallel. So… how can you harness the best of an agent's intelligent decision making with reliable sub agent task execution at scale? Here's how 👇 1/5
4
6
72
11,190
Once each individual sub agent completes its run, the results can either be joined back to the main table or acknowledged as completed. We can use structured outputs to do this join automatically via the same dispatch script as results stream in. 4/5
1
1
394
Finally, the output can be navigated and ingested by the orchestrating agent for any downstream follow up! From testing and implementation, we've seen these techniques begin to deliver incredibly reliable sub agent execution at scale, where relying on a single LLM's function calling ability and logic would fail. It will be critical to watch how this evolves as more agent harnesses begin to rely on both code execution and recursive agent runs behind the scenes! 5/5
1
3
367
One personal gripe I have with current ai product advertising is that many displays/billboards seem strangely… verbose? Like lots of shoehorned text awkwardly worldbuilding niche scenarios to get their use case across Am I just not the target audience? does this not seem counterintuitive to traditional brand/product marketing?
1
3
247
One of the most technically impressive agents I’ve had the honor of working on 🚒
Stop manually triaging agent failures. Let LangSmith Engine fix it.
2
6
32
8,788
Trace data is literally worth its weight in gold these days, if you know what to do with it! As has been established, creating effective agents requires shipping early, observing behavior, and iterating quickly. At the core of this are your agent traces capturing exact inputs, outputs, steps, and metadata along the way. Analyzing traces helps surface inefficiencies and areas for improvement, but they can also be used in more sophisticated ways to set up robust evaluations. Here's two of the ways we use traces to build evals for production agents 👇
12
22
156
41,939
Of course, combinations of these evals can cover a wide range of behaviors, scenarios, and edge cases. With both end-to-end and behavioral coverage, the eval suite can be used in some unique ways. The obvious one is traditional regression testing: making sure a change to your prompt or harness doesn't break existing behavior. But more interestingly, these evals can also serve as targets for optimization. I.e. a good suite can show strengths and weaknesses across model families, and pinpoint exactly where, say, a prompting change may let an open source model perform as well as a frontier closed model in a given scenario.
2
5
916
So… turns out SWE hasn't escaped testing with AI, rather it's more important than ever!
5
788
This is real agent security alpha 🔐
1
3
54
17,650
Do agents listen to you… or themselves? While evaling subagent behavior in deep agent systems, we noticed an interesting quirk in our agents' alignment with hand-written system prompts vs. the instructions given by the orchestrator 1/4 🧵
5
8
38
19,021
While our subagent system prompt was generally directional and open-ended, some models provided detailed rubrics and guidelines that resulted in wayyyy too strict behavior and limited the subagent's creative execution, hurting end performance. These larger briefs from the agent often directionally overrode the looser behavior we wanted to encourage from our prompting 3/4
1
3
491
The takeaway? It's important to consider and measure not just how you are prompting a subagent, but how your primary agent is prompting it too. The relationship an agent has with its subagent delegations can make or break the overall system's success 4/4
2
411