📢After months of work, I can finally share our latest research, couldn’t be more thrilled and excited. 🎉
We unify a policy 🤖 and a world model 🌍 into a single LLM, thus no external dynamics model needed!
Why does this matter? Because now, the policy can plan based on its internal world model!
And this planning boosts tool-use success rates to >90%, on top of SFT RL.
📄:
arxiv.org/abs/2506.02918
🧵[1/8]