Everyone selling the “own your learning loop” vision has one contradiction they never name.
The pitch goes: a few frontier models are eating everyone’s expertise and commoditizing it, so every firm must build its own loop - capture your people’s tacit judgment, run RL on real internal traces, encode the “company veteran” into a system you own and can swap models under. Build the moat. Compound the advantage. First movers win.
Read that again. The mechanism you’re prescribing inside the firm is the exact extraction you’re condemning between the labs and everyone else. Just pointed inward.
When a frontier model absorbs the open web’s expertise and sells it back, that’s commoditization and the political economy “won’t tolerate it.” When your platform absorbs your engineers’ and analysts’ judgment, makes it “replicable and scalable,” and books it as an asset the firm owns and the human doesn’t - that’s empowerment. Same move. Different beneficiary.
“You can offload a task but never your learning.” Whose learning? The entire point of the loop is to externalize the individual’s learning into the firm’s token capital. That line comforts the asset owner, not the person being mined.
And “human capital only becomes MORE valuable” is asserted, never argued - against the grain of the very mechanism. If the loop really captures the veteran well enough that you can hot-swap the model beneath it, then by that same logic it captures the veteran well enough to depend on them less over time. You can’t promise both perfect encoding and permanent indispensability.
This isn’t a “stable equilibrium.” Cumulative advantage, hard-to-replicate moats, first-mover lock-in - those are divergent dynamics. They concentrate. A genuinely distributed outcome would require forces pushing AGAINST the loop you’re celebrating: open weights, interoperability, regulation, portability you can actually verify. Calling the concentrating thing “stable” is the tell.
One more: the metaphor is a hill-climbing machine. Hill climbing is famous for getting stuck in local optima. A system trained on its own traces risks laundering yesterday’s judgment into tomorrow’s ground truth - and the better your moat, the harder it is to climb back out of confident-but-wrong. Compounding isn’t monotonically good.
I’m not against firms owning their IP. I’m against a vision that borrows anti-concentration moral language to sell a concentration tool, and never explains why the extraction is villainous one layer up and virtuous one layer down.