I think I want to write a blog post about some of the stuff I've dealt in the past, especially in distributed systems, like about how to run a closed feedback loop system (ideally via Kubernetes).
The idea of the blog post is to start small, explain how to schedule one-off scripts to a node, and then gradually build on top of it. And then move from there to Kubernetes, and apply Control theory semantics. Such as Setpoint (.spec), variables (.status), the controller (surprised Kubernetes uses the same term), actuators, cooldowns, the actual process and so on. There are so many details, like level triggering, read/write caches in the controller runtime, idempotency, failure scenarios (unhealthy nodes, runnint out of disk, full AZ regions going down, etc..).
When it clicks, it clicks, and then you can run tens of thousands of workloads in a self-healing way. See
@PlanetScale. We did all that with a very small group of people. I'm very proud of it, but it's hard to convey the message because everything is done async, and it's hard to keep those in mind. A technical blog post could be very nice and helpful.