People ask this question all the time and then don’t accept valid answers, likely because there are many overlapping reasons to make your own chip.
Different chips have different perf/$ on different workloads. With a custom chip you can:
- save money on a given workload
- improve performance on a given workload (same thing)
- have a more predictable supply chain
- gain control over software and IP
- learn a lot
- potentially sell it to other companies
The reasons that different companies make custom chips are specific to them.
TPU and Trainium are competitive on perf/$ and as a result are largely sold out. But depending on who you are, perf and $ are different. For a lab building a 10k cluster, stacking Broadcom/Marvell GCP/AWS margins is not always better than NVIDIA neocloud. But if you’re GDM, it almost always is because no GCP margin. Also the NVIDIA sw stack is the default. Why take that risk?
For MTIA or MAIA or AMD GPUs or Groq or Cerebras or whatever else, the perf/$ on the most important workloads is just not competitive today. So MTIA is used for recsys, and the others are used for single node, small model inference only. And then they don’t produce enough of them for it to matter anyway.
For OpenAI and Anthropic, if you’re spending 100B on compute, and a chip program costs 1B for a tapeout why not give it a go? Pick a workload you care about, make some customizations for it (which are all just tradeoffs anyway), establish a new supply chain, and learn some things make some friends along the way.
Can someone give me a clear answer to why everyone wants their own custom silicon. I kinda get wanting to be less dependent on Nvidia, but then aren’t you just moving your dependency one layer up to tsmc? Or are there genuinely performance gains to be had with model-chip aligned?