RL training and inference are historically two different tools. One generates outputs, the other updates weights. To train with RL, you wire them together and keep two copies of the model in sync.
That orchestration is most of the work.
Harmony removes this boundary: one engine for the whole loop. The same weights serve both inference and training, with no library swapping.
We built an interactive visualization that walks you through this.
đź”— link below