For any given desired computational outcome, there is some sequence of bits which encodes that. These bits of signal (of the desired outcome) can be intermixed with bits of noise.
The reason why these LLM-generated codebases by default have so many obviously useless lines of code is that, for any given generative step, the LLM’s job is to package up the few signal bits (from the prompt) into a plausible presentation of other bits, which may or may not be noise.
If they are extra signal (because they can be statistically inferred from the other signal bits), then that’s a win, but there’s a much higher probability that they’re actually just noise. This is why every AI generated tweet, article, or indeed code snippet contains drastically more fluff (noise) than what a focused person with reasonably good instincts for compression would produce.
So, in order to actually accomplish anything (without extreme vetting, compression, and modification of generated code), the “programmer” (prompter) needs to continuously generate more code to obtain the next signal bit they wanted, at the expense of many, many more bits of noise.
The result is often ~100x if not ~1000x more code than was needed, which is impossible to hand-edit, comprehensively understand, or compress. Layers upon layers of statically average nonsense, wrapping the few bits of utility you actually wanted.
i read this and was like huh thats a lot of code but surely hes built some huge super complex app
it’s a blog
he’s built a blog
300,000 lines of code for a blog
the blog posts are all ai slop
there’s like 10 lines of code per line of blog