Cool new open-weight model by Cohere: a new lightweight 30B open-weight model for agentic coding tasks.
This one builds on Command A using the parallel transformer design. Interestingly, even though it's almost half as big, it almost doubles the number of layers.
Also, they say that it's been specifically developed for agentic coding, not just coding. I.e., the evaluation is inside a workflow, not just on a single prompt-to-code-answer task.
For Terminal-Bench, the model has to use a terminal, inspect the environment, run commands, read outputs, etc.
For SWE-Bench the model works on real GitHub-style software issues where it has to understand the repository, find relevant files, make a patch, pass tests, etc.
SciCode and LiveCodeBench are more traditional because they mostly test whether the model can produce correct code for a specified problem. Sure, this still requires reasoning, but it's more like “Implement a numerical routine to compute a scientific quantity from given equations and inputs.” which doesn't require any interaction with the environment, existing files, tests, etc.
The focus on the agentic code benchmarks is probably why it's far ahead of Gemma 4 on those.
Overall, it's pretty competitive although not quite Qwen3.6-level performance.