Liran Tal

Liran Tal

Users
Tweets

AI Native Dev retweeted

Jun 13

> The most expensive model in the benchmark wasn't the best value I found similar results in my Snyk VulnBench benchmark for finding vulnerabilities (comparing Opus 4.7 and others) Released soon, stay tuned folks 👀😉

119

AI Native Dev

Simon Maple retweeted

AI Native Dev

@ainativedev

Jun 10

The most expensive model in the benchmark wasn't the best value. Rob Willoughby and Simon Maple ( @sjmaple ) evaluated 19 model configurations on real agentic tasks and found that DeepSeek V4 Flash scored 82.3 while costing just $0.0236 per task. Claude Haiku 4.5 scored 82.9 at roughly four times the cost, while DeepSeek V4 Pro scored 85.3 at nearly eight times the cost. The interesting part isn't that Flash beat stronger models. It didn't. The interesting part is how little quality was gained for how much additional spend. That becomes a very different conversation once you're running agents at scale. A model that looks marginally better on a benchmark can end up costing dramatically more over the course of a year, especially when agent workloads start growing. The benchmark also surfaced something that many teams probably aren't measuring closely enough. The biggest performance jump didn't come from switching models. It came from adding the right skill. DeepSeek V4 Flash moved from 64.1 to 82.3 with skill context applied, which raises an uncomfortable question about how much of agent performance is actually model selection versus everything built around the model. The full breakdown is worth reading, particularly the sections on points-per-dollar, turn counts, and why the cheapest model in the benchmark ended up being one of the most interesting. Read the full blog here: tessl.io/blog/same-quality-a…

Same quality, a quarter of the cost: Should DeepSeek Flash be your model of choice?

Discover if DeepSeek Flash is your cost-effective AI model choice, offering comparable quality at a fraction of the price. Explore our detailed analysis.

tessl.io

305

Liran Tal

Liran Tal @liran_tal

Jun 13

Replying to @ainativedev

Oh that's lovely indeed. Looks like mixed results, especially between the tool calls divide. Good you had been able to run it before the fable takedown 😉

Liran Tal

Liran Tal @liran_tal

Jun 12

Replying to @ainativedev

I like these benchmarks. Nice work Tessl team

574

AI Native Dev

Hamza Oza retweeted

AI Native Dev

@ainativedev

Jun 12

The biggest AI challenge inside organisations might have nothing to do with AI. Hamza Oza (@hamzaoza) connects two very different events, AI Native DevCon and Muslim Tech Fest, and arrives at the same conclusion from both. Most conversations about AI start with tools. The more interesting question is where value is created before the tool arrives. At AI DevCon, the discussion was about why individual AI productivity gains often fail to scale across teams. A developer can build faster with Claude Code, Cursor, or Copilot, but organisations still need shared context, standards, governance, and workflows before those gains become repeatable. At Muslim Tech Fest, the conversation surfaced the same idea at a personal level. Before AI can amplify someone's work, they need clarity on their strengths, weaknesses, judgement, and the areas where they create the most value. That parallel feels increasingly important. If an individual doesn't understand where they contribute value, AI won't solve that problem. If a team lacks shared standards, better tools won't create them. If an organisation doesn't understand what makes it effective, adding agents is unlikely to provide the answer. Technology can accelerate direction. It cannot provide direction. A thoughtful perspective on why reflection may be a more important starting point than augmentation. Read the full blog here: tessl.io/blog/reflection-bef…

Reflection Before Augmentation

Explore AI's impact on individual vs. team productivity and the importance of reflection before tool adoption. Discover insights from AI DevCon and Muslim Tech

tessl.io

AI Native Dev

Rohan Sharma retweeted

AI Native Dev

@ainativedev

Jun 11

Ryan Lopopolo put out a claim that it's "borderline negligent" not to use a billion tokens a day — and in this clip he explains exactly why. Intelligence extraction scales linearly with token consumption. That's why test-time compute exists. And getting to a billion tokens means thinking well beyond pair programming. Watch the full episode at youtu.be/MFQIKbr1IEo or listen wherever you get your podcasts. #AI #agenticcoding #claudecode #codex #AIskills

0:48

Liran Tal

Oleg Šelajev 🇪🇪🧊🐳 retweeted

Liran Tal @liran_tal

Jun 2

A skill from @shelajev, showing up on his talk at @ainativedev for doing a security audit that finds scattered credentials on the file system Claude Code refuses to run the skill, Oleg asks to rewrite on Python and so on and so on It's a fun experiment of behavior analysis

431

Ryan Lopopolo

Der.dev 🔥🛠️ retweeted

Ryan Lopopolo

@_lopopolo

Jun 10

Had a great conversation with the Tessl folks at @ainativedev London on all things Codex, agents, and harness engineering. Hope y’all give it a listen!

AI Native Dev

@ainativedev

Jun 10

Ryan Lopopolo tracked PR throughput on his OpenAI team from 3.5 per engineer per week up to 70 — not through adding headcount, but through iterating on the model and the harness together. Every revision of GPT-5 from 5.2 onward compounded on the last, and this clip shows exactly what that felt like from inside the team. Watch the full episode at youtu.be/MFQIKbr1IEo or listen wherever you get your podcasts. #AI #agenticcoding #claudecode #codex #AIskills

0:33

2,279

AI Native Dev

Dorothy Bartomeo retweeted

AI Native Dev

@ainativedev

Jun 10

0:33

2,854

Liran Tal

Liran Tal @liran_tal

Jun 10

Replying to @_lopopolo @ainativedev

It was fun! and you were great Ryan :-)

124

AI Native Dev

Michael Wall retweeted

AI Native Dev

@ainativedev

Jun 9

Ryan Lopopolo built a product at OpenAI with zero human-written code, and by the time his team reached its seventh engineer, new hires were making the team faster within two weeks. The secret isn't just better agents — it's Harness Engineering: the systems, constraints, and feedback loops that make agents trustworthy enough to let go. This conversation was recorded live at AI Native DevCon London 2026, and it's one of the most concrete breakdowns of production-grade agent development we've had on the show. Watch the full episode at youtu.be/MFQIKbr1IEo or listen wherever you get your podcasts. #AI #agenticcoding #claudecode #codex #AIskills

1:10

1,008

AI Native Dev

Simon Maple retweeted

AI Native Dev

@ainativedev

Jun 5

Developers using AI tools are creating and merging twice as many pull requests — but AI-generated PRs have a 60/40 merge rate compared to 80/20 for humans. That gap reveals something important about how agents are actually being used in the wild: probing, experimenting, spawning throwaway work. Jellyfish's Nick Arcolano breaks down what the data actually says. Watch the full episode at youtu.be/GbHfzFcIa0o or listen wherever you get your podcasts #AI #agenticcoding #claudecode #codex #AIskills

1:08

479

Liran Tal

Liran Tal @liran_tal

Jun 5

Replying to @thewritingdev @ainativedev @tessl_io @SammyHep @sjmaple

There's always next time

Nicholas Arcolano

Nicholas Arcolano

@arcolano

Jun 5

ICYMI: my @ainativedev podcast with @sjmaple is out. AI productivity in the wild, multi-agent workflows, and @_jellyfish_co data on the ROI case for token spend. Spotify: open.spotify.com/episode/3S0… Apple: podcasts.apple.com/us/podcas… YouTube: youtu.be/GbHfzFcIa0o

Why Developers Hit a Wall at 4 AI Agents

The AI Native Dev - from Copilot today to AI Native Software Development tomorrow · Episode

open.spotify.com

Rohan Sharma

Rohan Sharma @rrs00179

Jun 5

Replying to @liran_tal @ainativedev @tessl_io @sjmaple

@SammyHep where's mine? 😆

Liran Tal

Liran Tal @liran_tal

Jun 5

Thank you for inviting me to speak at @ainativedev and meet some of the coolest AI builders talent and minds in London 🚀 Appreciate @tessl_io for building this and @SammyHep, @sjmaple and team for all the effort to organize and make it a stellar AI event

750