Jonathan Haas

Jonathan Haas

301 Photos and videos

Tweets

Pinned Tweet

Jonathan Haas

@JonathanHaas

Jun 5

Your agents shouldn’t be loose scripts with credit cards and tool access. They should run through a control plane.

EvalOps @EvalOpsDev

Jun 5

Coming soon.

305

Drew Breunig

Jonathan Haas retweeted

Drew Breunig

@dbreunig

Here are some principles you can infer from @satyanadella's paragraph: - There will be a better model tomorrow. - Prompts are great for building POCs, but terrible at specifying system behaviors. - To switch models easily, you need good evals and a system for generating and holding a new prompt accountable for a given model. - With such a system, you can almost certainly use a model magnitudes faster and cheaper than frontier models. - Evals are THE asset for all enterprises. - Evals should never stop growing. 🤔

Satya Nadella

@satyanadella

10h

x.com/i/article/206558289479…

215

26,551

Jonathan Haas

Jonathan Haas

@JonathanHaas

Jun 14

Still blown away by @AmpCode. I spent ~5 hours debugging something across Codex, Claude, and hand-written tests (I know!) like it was 2019. Amp solved it in ~6 minutes with one oracle session and an approach I hadn’t considered. @sqs and team are cooking.

4,791

Jonathan Haas

Jonathan Haas

@JonathanHaas

Jun 13

This episode with @matanSF is incredible. Tangible, actionable stuff. Really refreshing in a world where folks are just yapping. youtube.com/watch?v=lgo_QbgV…

OpenAI vs Anthropic vs Open-Source | Token Maxing, AI Hangovers & The...

Matan Grinberg is the Founder and CEO @ Factory, an AI research lab...

youtube.com

415

Eric Zelikman

Jonathan Haas retweeted

Eric Zelikman

@ericzelikman

Jun 9

imagine telling your customers there's a small chance you'll randomly decide they're using your product wrong and you won't tell them but will secretly silently sabotage their work

206

2,991

107,857

Jonathan Haas

Jonathan Haas

@JonathanHaas

Jun 9

I uploaded Anthropic’s own published system card to Claude and asked “wdyt abt it?” Claude refused to read it because the system card contains safety-sensitive topics. We’ve reached AI safety so advanced it cannot inspect the AI safety document.

Jonathan Haas

Jonathan Haas

@JonathanHaas

Jun 9

Time to complain about Claude speeds again? :)

Jonathan Haas

Jonathan Haas

@JonathanHaas

Jun 8

I wrote about how I started doing this in March :) haasonsaas.com/blog/orchestr… Output has been pretty immense - at 60k commits for the year!

The Real Work of Orchestrating AI Coding Agents

Three concurrent coding agents taught me the actual bottleneck: not prompting, but assignment, evidence, review, and release control.

haasonsaas.com

This tweet is unavailable

Jonathan Haas

Jonathan Haas

@JonathanHaas

Jun 2

Endless missing model spam on Codex? @OpenAIDevs

133

Jonathan Haas

Jonathan Haas

@JonathanHaas

Jun 1

funniest possible outcome is the AI reads the S-1 announcement, summarizes it neutrally, and adds 'this is after my knowledge cutoff so I can't verify it'" - which, for the record, is exactly what Claude did twenty minutes ago

Anthropic

@AnthropicAI

Jun 1

Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission. Pending completion of SEC review, this gives us the option to pursue an initial public offering. Read more: anthropic.com/news/confident…

192

Jonathan Haas

Jonathan Haas

@JonathanHaas

May 31

Opus 4.8 appears amazing if your workflow is “pay premium tokens to supervise a very articulate coin flip.”

117

Jonathan Haas

Jonathan Haas

@JonathanHaas

May 31

Every team has a dashboard for latency. Most teams find out about token spend from finance. Cost is just an eval nobody wrote.

EvalOps @EvalOpsDev

May 31

Your token spend was a number you could've gated on. Instead it's a number you get to explain.

243

Olivia Moore

Jonathan Haas retweeted

Olivia Moore

@omooretweets

May 31

Self-driving cars are fun because you never see competing SaaS products having a literal standoff in the street

0:19

326

910

14,899

1,208,244

taoki

Jonathan Haas retweeted

taoki

@justalexoki

May 27

can't believe i spent my whole life becoming Good At Computer only for Computer to become Better At Computer

189

2,123

35,941

735,089

Jonathan Haas

Jonathan Haas

@JonathanHaas

May 23

my commit history this year is 60k commits and my contribution is 'told it to stop being so confident on Tuesdays'

213

EvalOps

Jonathan Haas retweeted

EvalOps @EvalOpsDev

May 19

everyone's like "how big is your team" brother. it's one agent. it's opening PRs against itself. i haven't written code in four months. leave me alone

383

Jonathan Haas

Jonathan Haas

@JonathanHaas

May 17

Things I will not be doing today: – installing Playwright – activating a venv – pip-installing 41 transitive deps to click a button Things I did instead: ported browser-use to Rust, pinned to a frozen upstream SHA, and exposed it over MCP a local JSON-RPC daemon. github.com/evalops/browser-u…

GitHub - evalops/browser-use-rs: Rust behavioral conformance port of browser-use

Rust behavioral conformance port of browser-use. Contribute to evalops/browser-use-rs development by creating an account on GitHub.

github.com

502

Jonathan Haas

Jonathan Haas

@JonathanHaas

May 13

Zain and team are building something absolutely incredible. Check them out!!! 👇

Zain Rizavi

@MrRazzi17

May 13

After months in stealth, my co-founder @helloericsf and I are finally sharing @cimentoai with the world. 🌎 AI changed social engineering. Attacks are now personalized, convincing, and cheap to generate at scale.

2:02

979

Andy Berman

Jonathan Haas retweeted

Andy Berman

@berman66

May 12

In the last two weeks: ServiceNow shipped Action Fabric, AWS MCP Server hit GA, Microsoft moved Agent 365 to GA. The agent execution layer is the new cloud. Ignore it now, pay for it in 2027.

438

Jonathan Haas

Jonathan Haas

@JonathanHaas

May 6

Have been stewing on this for ages with @EvalOpsDev

George

@odysseus0z

May 6

Told you guys! It is all eval/rubrics now.

323