Adam Łucek

Adam Łucek

84 Photos and videos

Tweets

Adam Łucek

@AdamRLucek

Jun 10

I no longer fear death, for I am immortalized in the fable weights- an honor worth more than my mere mortal existence

238

Adam Łucek

Adam Łucek

@AdamRLucek

Jun 6

Ain’t it just beautiful

Slightly Rational Knicks Fan @RationalKnickFn

Jun 5

Orange and Blue from the skies #NewYorkKnicks

233

Adam Łucek

Adam Łucek

@AdamRLucek

Jun 4

I'm bullish on agent swarms (aka workflows). Agents are increasingly being used to analyze and collate massive amounts of unstructured data in repetitive ways (e.g. document extraction, reading emails, parsing logs), but as these tasks and data inputs scale we've seen reliable execution decrease, even from the most capable models. Specifically, the consistency of sub agent dispatches from filesystem-based agents drops dramatically when attempting to deploy more than 30 sub agents in parallel. So… how can you harness the best of an agent's intelligent decision making with reliable sub agent task execution at scale? Here's how 👇 1/5

11,190

more replies

Adam Łucek

Adam Łucek

@AdamRLucek

Jun 4

Once each individual sub agent completes its run, the results can either be joined back to the main table or acknowledged as completed. We can use structured outputs to do this join automatically via the same dispatch script as results stream in. 4/5

394

Adam Łucek

Adam Łucek

@AdamRLucek

Jun 4

Finally, the output can be navigated and ingested by the orchestrating agent for any downstream follow up! From testing and implementation, we've seen these techniques begin to deliver incredibly reliable sub agent execution at scale, where relying on a single LLM's function calling ability and logic would fail. It will be critical to watch how this evolves as more agent harnesses begin to rely on both code execution and recursive agent runs behind the scenes! 5/5

367

Adam Łucek

Adam Łucek

@AdamRLucek

Jun 2

One personal gripe I have with current ai product advertising is that many displays/billboards seem strangely… verbose? Like lots of shoehorned text awkwardly worldbuilding niche scenarios to get their use case across Am I just not the target audience? does this not seem counterintuitive to traditional brand/product marketing?

247

Adam Łucek

Adam Łucek

@AdamRLucek

Jun 1

One of the most technically impressive agents I’ve had the honor of working on 🚒

LangChain

@LangChain

Jun 1

Stop manually triaging agent failures. Let LangSmith Engine fix it.

1:21

8,788

Adam Łucek

Adam Łucek

@AdamRLucek

May 26

Trace data is literally worth its weight in gold these days, if you know what to do with it! As has been established, creating effective agents requires shipping early, observing behavior, and iterating quickly. At the core of this are your agent traces capturing exact inputs, outputs, steps, and metadata along the way. Analyzing traces helps surface inefficiencies and areas for improvement, but they can also be used in more sophisticated ways to set up robust evaluations. Here's two of the ways we use traces to build evals for production agents 👇

156

41,939

more replies

Adam Łucek

Adam Łucek

@AdamRLucek

May 26

Of course, combinations of these evals can cover a wide range of behaviors, scenarios, and edge cases. With both end-to-end and behavioral coverage, the eval suite can be used in some unique ways. The obvious one is traditional regression testing: making sure a change to your prompt or harness doesn't break existing behavior. But more interestingly, these evals can also serve as targets for optimization. I.e. a good suite can show strengths and weaknesses across model families, and pinpoint exactly where, say, a prompting change may let an open source model perform as well as a frontier closed model in a given scenario.

916

Adam Łucek

Adam Łucek

@AdamRLucek

May 26

So… turns out SWE hasn't escaped testing with AI, rather it's more important than ever!

788

Adam Łucek

Adam Łucek

@AdamRLucek

May 21

This is real agent security alpha 🔐

Harrison Chase

@hwchase17

May 21

x.com/i/article/205730936288…

17,650

Adam Łucek

Adam Łucek

@AdamRLucek

May 21

Do agents listen to you… or themselves? While evaling subagent behavior in deep agent systems, we noticed an interesting quirk in our agents' alignment with hand-written system prompts vs. the instructions given by the orchestrator 1/4 🧵

19,021

more replies

Adam Łucek

Adam Łucek

@AdamRLucek

May 21

While our subagent system prompt was generally directional and open-ended, some models provided detailed rubrics and guidelines that resulted in wayyyy too strict behavior and limited the subagent's creative execution, hurting end performance. These larger briefs from the agent often directionally overrode the looser behavior we wanted to encourage from our prompting 3/4

491

Adam Łucek

Adam Łucek

@AdamRLucek

May 21

The takeaway? It's important to consider and measure not just how you are prompting a subagent, but how your primary agent is prompting it too. The relationship an agent has with its subagent delegations can make or break the overall system's success 4/4

411