I read this new paper by Hexo Labs called SIA and the framing is sharper than most self-improving agent papers I've seen.
the field has split into two camps that don't talk to each other:
rewrite the scaffold around a frozen model (prompts, tools, retry logic)
do test-time RL on the weights while keeping the harness fixed
SIA's move: let a Feedback-Agent choose, generation by generation, whether to edit the harness OR fire a LoRA weight update.
same loop WITH two levers.
the result that stuck with me:
scRNA denoising task. harness-only iteration plateaued at 0.241 mse_norm for generation after generation.
first weight-update checkpoint added two lines: clip and round outputs to non-negative integers. basic biological constraint. pushed it straight to 0.289.
no amount of scaffold rewriting ever found that fix.
the lesson:
the harness shapes how the agent searches
the weights shape what it actually knows
you can't rewrite your way out of a knowledge gap.