I hand-wrote a 500-LoC RL stack to make hacking on RL research much easier.
Most RL stacks are either massive and unhackable, or duct-taped research scripts. I am open-sourcing Mithrl, a modular RLVR stack.
Next items on my checklist: adding more complex environment examples, supporting multi-gpu async RL, and QoL fixes.
I might scrap external runtime dependencies (Huggingface PEFT vLLM) and write purpose-built, simpler versions from scratch if I feel the need.
If you want to experiment with RL and are looking to own sovereign tools, I’d love to get on call, understand your requirements and help integrate for free.