Chinese multimodal powerhouse StepFun just flipped the global AI table. Step 3.7 Flash delivers 400 TPS inference, matching 97% of Claude Opus 4.6 at 1/9 the cost. This isn't a simple update—it's a paradigm shift for the Agent era.
What should an Agent-era model look like? Give it a cockpit screenshot, old models just describe dials. Step 3.7 Flash selects the area, identifies instruments, plans steps, and guides a mouse through takeoff. From understanding to executing—that's the divide.
The competition has changed. As Agents enter production, latency and token costs explode. Step 3.7 Flash uses sparse MoE (196B total, 11B active) for 400 TPS. Speed is capability—faster means more iterations, better results in multi-step tasks.
Real tests: 12 messy receipts auto-extracted into Excel; Blender screenshot teaches you to delete a cube; Jianying interface locates export button, warns about 1080i and full-track export pitfalls. The model is on-site, working as your hands.
Even more impressive: emergent behavior. After writing frontend code, it switches to GUI to test rendering, then revises its own code. No one taught it this loop—it figured it out. That's what an Agent should be.
The Advisor strategy is the killer feature: small model executes, consults a larger model only at critical junctures. Result: $0.19 per task vs $1.76 for Claude Opus 4.6, with 97% coding capability. Two dimes vs two dollars—any team burning tokens gets it.
Step 3.7 Flash isn't a benchmark king, but it dominates the cost-efficiency frontier. Multimodal extreme efficiency near-zero cost makes it the foundation for production-grade Agents. Flash is no longer a budget option—it's the pragmatic force taking over the world.