Hear me out! Looking at options to run local LLMs that don't involve paying with body parts, there are few options which fit.
Let's say you have 1k Euro to spend on the inference part alone and you want more than 24gb VRAM, you are really in for a world of trouble.
Your choices come down to:
- Nvidia Tesla V100 - 32gb VRAM for around 800 Euro.
- 2x AMD BC-250 - good little boards, but you have to tinker with them a lot, not for the faint of heart.
- MiniPC with Ryzen APUs (some 6000 series, 7000 series, 8000 series, HX370/470 just barely if not a bit over). Problem with these is that they are slow, maybe 20 - 25 TPS on a good day for the good ones (MoE).
- 2x Nvidia Tesla P40 - old and not something I can recommend, but you do get 48gb of VRAM for not a lot of money.
- AMD MI50/V620 - 32gb VRAM for good money, but you accept weak software support.
- not much else comes to mind.
Yes, you read the last option correctly. Don't tell me about the Intel B70, the market price is nowhere near MSRP (1350 Euro for example). 2x 3090s would blow passed 1k. The AMD R9700 is close to 2k.
Given what you just read, I would highly advise you to start thinking about getting some hardware sooner rather than later, because I have a feeling prices are going even higher and everybody's going to be scrambling to get hardware.
If you have any other ideas, I'm open to hear them, in the mean time Nvidia V100 it is then.