1. easiest is TPU, just run and shit done, u dont feel like a multinode actually, only available GCP.
2. Nvidia, people still struggle with multinodes and after a few days, u might got nccl issues some sort like that but restart solved the problem.
3. amd, pure painful
Andrej Karpathy talking about -
- rarity of talents in distributed computing with GPUs.
- the opportunities in computer architecture because von Neumann architecture may not be optimal
- progress in precision
- sparsity in neural networks