The issue I feel with writing too many test and big features per hour is maintenance and iteration of this feature will be painful.
Too many test is smell for testing implementation and not intent
Big feature per hour is smell for new features that are good enough but continue to grow the entropy of system complexity.
AI can 10x the volume of code, good programs though are usually re-written 10 times, each time the intent is clearer and simpler.
I would love to see this type of LOC but for removals too, removing 4K LOC is a signal that I will trust a bit better.
I use a very specific prompt to push Claude to check its work and do a lot of testing and thinking about perf and refactoring. I find I can do big features (4K LOC with full testing) in about an hour.