State of foundational models according to Joe bench:
* Gemini 3 Pro is benchmark maxed - often can’t answer basic questions.
* GPT-5 templated responses and incompleteness let it down.
* Claude Opus/Sonnet 4.5 are goat across every category - coding, finance, law, fitness, EQ…