Furong Huang

Furong Huang

Furong Huang

@furongh

6 Feb 2025

5️⃣ Final thoughts? Reward modeling isn’t perfect, and no method completely eliminates reward hacking. But with a structured inductive bias, trajectory-consistent learning, and lower complexity, GenARM makes a strong case for more reliable token-level alignment. Let’s discuss! 👇 Do you think reward hacking can ever be fully solved? 🤔 #AI #LLMs #ICLR2025 #RewardModeling #TestTimeAlignment

719

Furong Huang

Furong Huang

@furongh

6 Feb 2025

🚀 LLMs need better reward models! 🚀 Current reward models are slow & inefficient—they score full responses AFTER generation, making test-time alignment painfully slow. What if we could guide LLMs token-by-token—in real time? 🔴 Meet GenARM! Our #ICLR2025 work introduces Autoregressive Reward Models (ARM): ✅ Revolutionizing reward modeling ⚡ ✅ Supercharging test-time alignment 🏎️ ✅ Guiding LLMs dynamically—no retraining needed! 🧠 Let’s break it down! 🧵👇 📄 Paper: arxiv.org/abs/2410.08193 #AI #LLMs #TestTimeAlignment

410

55,044