✴️ Pleased to introduce our new paper
yining610.github.io/dynamic-…
- Rebalance multiobjectives during training through dynamic reward weighting
- Build Pareto-dominant front over static baselines across online RL algorithms, datasets, and model families
- Faster convergence rate
1/8