STEPFUN DROPPED A FLASH MODEL THAT COMPLETES THE FULL AGENT LOOP
a lot of Flash-tier models stop at "cheaper and faster"
in my run, this one actually finished the task: plan, write code, run it, read the output, ship
[ the specs ]:
- 198B total / ~11B active (Sparse MoE)
- 256K context window
- up to 400 tokens/s
- 56.3 on SWE-Bench Pro
- two #1 rankings on multimodal benchmarks
- Apache 2.0, runs locally
[ what I tested ]:
gave it one task and let it run end to end:
"build a working CSV analytics tool – generate the data, write the analyzer, run it, ship a chart"
it planned the steps, wrote the code, executed it, read the real output, and produced a working script a revenue chart
no hand-holding. it drove the whole loop itself
[ the numbers from my run ]:
- full task completed in 26.1s
- 3 tool calls, 4 reasoning steps
- 3 files shipped: data, analyzer, chart
- zero manual steps, no errors
[ why it matters ]:
multi-step runs are where agent setups often drift, drop tool calls, or stop early
in my test, Step 3.7 Flash held the full plan → execute → observe → iterate loop together
works out of the box with Claude Code, OpenClaw, Hermes, MCP, and Skills
if you build agents or coding workflows, this is worth testing