DeepSeek was great for me in Hermes Agent. It was excellent at tracking and fixing bugs, really solid with code issues, and strong in understanding, coverage, and general knowledge. As a general-purpose agent, I highly recommend it.
But for long-running tasks, it’s bad and not suitable at all. The model itself seems wired to stop as early as possible and ask whether you want to continue, no matter what. It even tries to break goal mode. Sadly, it has no long-task stamina.
As for Kimi, my last test was with Kimi 2.5. It was good, especially for UI work, and very good at agent management. But it was extremely slow and usually needed multiple rounds of tweaking and improvement to reach the target. It couldn’t reliably complete the task from a single prompt the way stronger models can.
I haven’t tested their latest model yet, so I’m waiting to try it before giving a full opinion.
What do you think of kimi or deepseek from your experience? From this post I got the idea of how GLM is compared to opus or gpt model