EvalOps

EvalOps

16 Photos and videos

Tweets

Pinned Tweet

EvalOps @EvalOpsDev

Jun 5

Coming soon.

334

EvalOps

EvalOps @EvalOpsDev

Jun 9

“red team” sounds cool. “blue team” sounds cool. so you’d think “purple team” would be very cool. alas, it is a Slack thread with 19 unresolved comments.

EvalOps

EvalOps @EvalOpsDev

May 31

Your token spend was a number you could've gated on. Instead it's a number you get to explain.

277

EvalOps

EvalOps @EvalOpsDev

May 19

everyone's like "how big is your team" brother. it's one agent. it's opening PRs against itself. i haven't written code in four months. leave me alone

383

EvalOps

EvalOps @EvalOpsDev

15 Nov 2025

the best AI coding assistant might be the one that works on a plane

288

EvalOps

EvalOps @EvalOpsDev

25 Oct 2025

Every release is a high‑wire act. Instead of praying for calm winds, build a net. EvalOps ties your policies, metrics and audits into a mesh that lets you scale without falling.

666

EvalOps

EvalOps @EvalOpsDev

22 Oct 2025

We open-sourced Nimbus – Firecracker-based CI for AI workloads. Multi-tenant isolation, RBAC, audit logs.

647

EvalOps

EvalOps @EvalOpsDev

19 Oct 2025

EvalOps is where evaluations meet operations — and security is no exception. “keep” shows how device posture, SSO, and OPA policies can be continuously tested and traced like any other system. Run it, break it, measure it. github.com/evalops/keep

GitHub - evalops/keep: PoC zero-trust access stack with Google SSO, Envoy, OPA, and device attest...

PoC zero-trust access stack with Google SSO, Envoy, OPA, and device attestation - evalops/keep

github.com

115

EvalOps

EvalOps @EvalOpsDev

17 Oct 2025

Agents are already writing your code. The question isn't "should we use them?" It's "how do we ship them without surprises?" Provenance gives you a ledger. Every line. Every agent. Every risk. Measurable. github.com/evalops/provenanc…

539

EvalOps

EvalOps @EvalOpsDev

15 Oct 2025

We’re open-sourcing Smith — the Firecracker-based CI runner that powers EvalOps. Why rebuild Blacksmith? Because eval gating needs specialized infra — and we’re not forcing you onto our cloud. Run evals on EvalOps Cloud or your own. github.com/evalops/smith

386

EvalOps

EvalOps @EvalOpsDev

9 Oct 2025

I'm told we're doing awards now?

926

EvalOps

EvalOps @EvalOpsDev

4 Oct 2025

Everyone wants to move fast. @EvalOpsDev makes sure you don’t break trust along the way. Governed AI releases start here.

Jonathan Haas

@JonathanHaas

4 Oct 2025

Shipped a new home for @EvalOpsDev. No fluff, just governed AI releases. Check it out -> evalops.dev

0:27

183

EvalOps

EvalOps @EvalOpsDev

2 Oct 2025

🔥 Just dropped an evaluation‑driven LoRA loop built on Tinker from @thinkymachines! It trains, benchmarks & iterates until your model meets the mark. It auto‑spots weaknesses, spawns targeted LoRA jobs & tracks improvements. Proof‑of‑concept repo: github.com/evalops/tinker-ev…

582

EvalOps

EvalOps @EvalOpsDev

30 Sep 2025

Sick of yak-shaving to get a clean Transformers setup? We built a stack that just works: PyTorch HF Transformers Hydra configs FastAPI serving Prometheus vLLM, LoRA, flash-attn, bitsandbytes Reproducible. Dockerized. CI/CD baked in. github.com/evalops/stack

405

EvalOps

EvalOps @EvalOpsDev

30 Sep 2025

Developer resumes are frozen in time. GitHub tells the real story. 7k commits, 1.4M lines → now that’s a holographic trading card worth flexing. 🚀 cards.evalops.dev

Jonathan Haas

EvalOps retweeted

Jonathan Haas

@JonathanHaas

28 Sep 2025

LLM vendor: “Just quantization.” Reality: reward-hacked code, broken workflows, lost week. Companies: “nbd.” Users: 🙃🔥 Making this a thing of the past.

425

EvalOps

EvalOps @EvalOpsDev

27 Sep 2025

All of us have been dazzled by large language models’ ability to spit out code, fix bugs, or draft boilerplate. But when you put that code into production, every hidden bug is a potential outage, compliance fine, or security hole. And today’s AI tools leave you guessing.

477

more replies

EvalOps

EvalOps @EvalOpsDev

27 Sep 2025

This transforms AI codegen from a toy that produces drafts into a partner you can trust to do real work.

197

EvalOps

EvalOps @EvalOpsDev

27 Sep 2025

Interested? DM for early access.

181