A $0.05 model nearly beat a $0.95 model on the same task.
Not on speed. Not on cost efficiency.
On pure task completion quality.
I used AgentUse to benchmark 5 AI models (claude opus 4.5, minimax m2.1, glm 4.7, sonnet 4.5, and haiku 4.5) on Notion database CRUD operations. Here's the ranking ๐