super excited this it out! lots of fun insights, tips, and tricks on how all parts of the RL stack, from policy gradient in general, to policy staleness, to clipping, to numerics, all modulate entropy some principled control strategies to stabilize!
Our ICLR 2026 paper "Entropy-Preserving Reinforcement Learning" is now on arXiv.
We strike at one of the core RL issues: exploration in action space.
We study why token distribution entropy collapses in LLM post-training preventing further exploration and we propose fixes!