This "Horizon Beta" model is sort of GOATed. I gave it a prompt to figure out a non-negative loss function for a gamma distribution head, to prevent gradient interference, and it nailed it.
Sonnet, Grok, Qwen3 275B, Kimi K2 had all fallen flat on their face on the same task