Here are some principles you can infer from @satyanadella's paragraph:
- There will be a better model tomorrow.
- Prompts are great for building POCs, but terrible at specifying system behaviors.
- To switch models easily, you need good evals and a system for generating and holding a new prompt accountable for a given model.
- With such a system, you can almost certainly use a model magnitudes faster and cheaper than frontier models.
- Evals are THE asset for all enterprises.
- Evals should never stop growing.
🤔
Still blown away by @AmpCode.
I spent ~5 hours debugging something across Codex, Claude, and hand-written tests (I know!) like it was 2019.
Amp solved it in ~6 minutes with one oracle session and an approach I hadn’t considered.
@sqs and team are cooking.
This episode with @matanSF is incredible. Tangible, actionable stuff. Really refreshing in a world where folks are just yapping.
youtube.com/watch?v=lgo_QbgV…
imagine telling your customers there's a small chance you'll randomly decide they're using your product wrong and you won't tell them but will secretly silently sabotage their work
I uploaded Anthropic’s own published system card to Claude and asked “wdyt abt it?”
Claude refused to read it because the system card contains safety-sensitive topics.
We’ve reached AI safety so advanced it cannot inspect the AI safety document.
funniest possible outcome is the AI reads the S-1 announcement, summarizes it neutrally, and adds 'this is after my knowledge cutoff so I can't verify it'" - which, for the record, is exactly what Claude did twenty minutes ago
Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission.
Pending completion of SEC review, this gives us the option to pursue an initial public offering.
Read more: anthropic.com/news/confident…
Things I will not be doing today:
– installing Playwright
– activating a venv
– pip-installing 41 transitive deps to click a button
Things I did instead:
ported browser-use to Rust, pinned to a frozen upstream SHA, and exposed it over MCP a local JSON-RPC daemon.
github.com/evalops/browser-u…
After months in stealth, my co-founder @helloericsf and I are finally sharing @cimentoai with the world. 🌎
AI changed social engineering. Attacks are now personalized, convincing, and cheap to generate at scale.
In the last two weeks: ServiceNow shipped Action Fabric, AWS MCP Server hit GA, Microsoft moved Agent 365 to GA.
The agent execution layer is the new cloud.
Ignore it now, pay for it in 2027.