The best production model is the one trained for the job.
Gravity Ads replaced a 70B model on Cerebras with a specialized 1B model trained for their actual workload.
Same quality, much faster and cheaper inference:
- p50: 152ms
- p99: 5.7x lower
- cost: ~10x lower
- model: 70x smaller
Great working with
@trygravityai on this.
Case study:
inference.net/case-study/graβ¦