how we use llms in production:
- llama 4 on groq → classification, filtering, routing, summarization (fast cheap)
- gpt-5 → all user-facing text generation (cheaper than claude)
- claude → anything involving code
- gpt-image-1/flux → image gen (changes frequently)
for coding: mostly claude and some gemini/gpt-5. smaller models for function-level changes only.
general rule: use the cheapest model that works for the task except for user-facing text
examples: