Last night I ran the Night Shift on a pretty important sweep: checking my test suite for problems.
It found 96 issues in 158 tests. Pretty good work, to be honest.
The problems I had it look for:
- Reaching into internals to “cheat”
- Shortcuts that reduce confidence
- Overly-specific / brittle assertions
- Tautologies (asserts values it just wrote)
- Weak assertions … can’t really fail
- Wrong layer (unit testing when the question is integration)
- Stubs/mocks that don’t really test real conditions, or fixtures that don’t align with real code (I lean toward using e2e tests wherever possible)
- Dead code
- “Simulating” multiplayer instead of actually setting up multiple instances and testing for real
- Doing redundant setup when one test scaffold would work for multiple related assertions (for test suite speed)
I had it first build a catalog of tests, and then work through that with subagents, scoring / providing analysis. And then one by one, with a commit per improvement, it went through and improved each one, running the tests as it went.
Well worth the token spend!