The problem isn’t that the update was malformed.
The problem isn’t that the validator they wrote didn’t catch the error before it was deployed.
The real problem is that they didn’t have any canary system to which they deployed the update before rolling it out to millions of endpoints all at once.
Since no external factors — like the driver of another vendor (see the Kaspersky issue with Intel drivers years ago) — were involved, a single $400 canary machine could have prevented this disaster. The absence of such a machine and a staged rollout could be interpreted as gross negligence in court, and that’s why everyone at CrowdStrike is worried.
Mistakes and slips are human. No one would be angry about that. But having only a simple and possibly flawed validator between an analyst and a kernel driver on millions of systems worldwide could be seen as grossly negligent.
It’s the only thing that matters. Everything else they describe, added to their testing, and improved processes is just filler, possibly to distract from the one relevant fact.