Whenever I try non-OpenAI/Anthropic models, I'm disappointed. I have a smoke test for a coding agent harness. GPT 5.4 mini, 5.5, Haiku, Sonnet, and Opus all pass all 6 tests. All 11 other current-version models I tested (DeepSeek v4, MiniMax M3, etc.) failed catastrophically.