Learning robust, general-purpose reward functions for robotics unlocks many potential applications, like on-robot reinforcement learning or dataset validation. However, there’s a question of how to actually train such reward functions. Training success/failure prediction leads to ambiguous signals partway through a demonstration — it’s hard to measure progress — making the method unsuitable for reinforcement learning, among other things. Predicting progress, on the other hand, does not give a good way of using failure data.
So why not do both? Robometer combines both progress and preference supervision, resulting in a stable, scalable, and highly general reward learning approach.
@aliangdw @yigitkkorkmaz
and
@Jesse_Y_Zhang join us to tell us more.
Watch Episode #84 of RoboPapers, with Chris Paxton and Jiafei Duan today!