You are making the right decision. GPUs are the correct way to run inference, unless you're budget limited and want to fall back to to running very large models very slow. Which is acceptable if that's the limit of your budget. But even then, you could probably do better with GPUs.
I really like the RTX 6000 Pro Blackwell cards. Good idea to make sure you have at least as much CPU RAM as total VRAM. Get a good power supply. Your cards will live longer that way, and you can power limit them to extend their lifespan (if they are regular gaming GPUs not intended for 24/7 use, such as the 3090 or 5090).
Since you're thinking about going headless, Linux is a good choice for this. I like Ubuntu Server because it's super minimalist and easy. Snapper and BRTFS are your friends.
If you want to use the PC with a monitor keyboard for practical purposes or gaming, CachyOS is a great Linux option and has great performance optimizations built in.
Now that we have agents in the terminal, and even free ones in apps like OpenCode, there has never been a better time to learn Linux.
Once you set up your box (probably with a monitor and keyboard at first), install Tailscale on it along with your laptop and you can SSH in like a pro from anywhere.
What I love about the Macbook Air -> SSH -> Linux Box setup is having the best of both worlds. Insane compute power plugged into the wall in my house, insane battery life and display wherever I happen to be at any given moment.
Throw a dictation app on your Macbook like Pindrop and your life will never be the same. Mine never has. Now you're literally able to talk with your Linux box from anywhere.