Groq is an engine that can run inference extremely quickly for any model you give it. I’ll never use gpt or o1 in production APIs because the output is too slow, but if I could run gpt or o1 on Groq chips I’d strongly consider using them in critical workflows.
Groq doesn’t train and ship models, just like Anthropic doesn’t build their own hardware microprocessors (they train and ship models). And I used Anthropic in my analogy for a reason: just this week they realized that running Claude on fast Nvidia chips (unoptimized for inference) doesn’t make sense so they partnered with Amazon’s in-house chip design lab annapurnalabs to build their own type of Groq-like chips that run inference quickly.
So a model-building company (Anthropic) realized that it would be a better customer experirence if users could access their model (Claude) faster, and the best way to do that is to run Claude on custom chips that prioritize inference.
Which is exactly what Groq does, and why they tweeted what they did. It’s not a wrapper, it’s an engine you put into your car to make it faster and better. And a lot of car manufacturers don’t engineer and build their own engines, they source them from expert enginebuilders :)