Filter
Exclude
Time range
-
Near
10 Sep 2024
Excited to share our new work: Enhancing Preference-based Linear Bandits via Human Response Time ⏱️🤖 @edgeyyzhang, Zhaolin Ren, Prof. Na Li, @ClaireYLiang, Prof. @julie_a_shah 👉 arxiv.org/abs/2409.05798 We show that human response times provide information about human preference strength, and speed up preference learning. This complements existing bandit algorithms that only learn from binary choices. We demonstrate this by integrating a psychology model (the EZ-Diffusion Model) into a bandit algorithm. #AI #MachineLearning #RLHF #HumanFeedback #psychology #Bandits #Robotics #EZDiffusionModel
2
5
1,327