🚀 UGround accepted to
#ICLR2025 [scores=10/8/8/5]! 🎉
We’re also thrilled to share some exciting updates:
✨ UGround is SOTA—again!
Using the exact same training data, our latest model achieved 89.4% accuracy on ScreenSpot, outperforming models from Google, Anthropic, Apple, and others.
Even more surprising? Without any desktop data, UGround also excels on ScreenSpot-Pro, a new challenging desktop benchmark. Incredible cross-platform generalization! 💪
We’ve also released models in 2B, 7B, and 72B, aiming to provide a strong foundation for further research.
📢 Try our secret sauce!
The highly effective training datasets we synthesized for UGround are now available for research use.
Additionally, we’ve provided a comprehensive evaluation suite packed with meaningful resources to help researchers test GUI Agents and grounding models with ease.
Since UGround's first release in August 2024, we’ve been blown away by the community’s enthusiasm. Our modular agent design, SeeAct-V, is now widely used—what an honor to contribute to the evolution of GUI Agents and see so many brilliant works emerge.
💡 Shoutouts:
- Deep gratitude to my amazing advisors,
@ysu_nlp and
@hhsun1 , for their unwavering guidance, and to my amazing collaborators,
@demisama_ ,
@boyuan__zheng , and
@YihengShu .
- Huge thanks to
@OrbyAI for their incredible support with infra, compute, and analysis. Special appreciation to Yanan Xie, Gang Li, Cheng Chang, and Yining Mao for their critical contributions.
Finally, thanks to the brilliant researchers from various groups for their insightful discussions and generous help! Let’s keep pushing the boundaries of GUI Agents together! 🚀