A cool technique from the RunRL research team: using PD controllers to balance multiobjective loss functions!
Suppose you want to train a model to give short yet relevant answers. For a given minimum level of relevance, this lets us improve on the other reward terms much more!