My current agent stack model ratings:
4
@NousResearch Hermes agents
- 3 on GPT 5.5 (xhigh)
- 1 on Qwen 3.6 (Local)
2
@openclaw agents
- 1 on GPT 5.5 (xhigh)
- 1 on Claude Opus 4.7
This is after ditching GLM, Kimi, and Minimax agents based on their coding / agentic work (see below)
------------Ratings-------------
@OpenAI Gpt 5.5 - 9.5 coding / 9 agent
@claudeai Opus 4.7 - 9 coding / 9.5 agent
@Alibaba_Qwen 3.6 - 8.5 coding / 8 agent
@MiniMax_AI - 7 coding / 6 agent
@Kimi_Moonshot - 7.5 Coding / 6.5 agent
@Zai_org GLM 5.1 - 8 Coding / 8 agent
My token usage: one $200 max plan gives me 4 agents on Gpt 5.5 xhigh vs 1 Claude agent for $200 max plan