in local ai fast and worth it are two completely different numbers.
last post i showed you the fast one. this one is the number that actually decides what you should buy, and it does not crown the same winner.
quick catch up if you missed it. i have two 128gb boxes on my desk, the nvidia dgx spark and the amd strix halo, and i ran the exact same model on both, byte for byte the same file, same everything, both idle.
nvidia won on raw speed. that was the whole post.
but raw speed is what the spec sheet wants you to stare at. the number that actually matters when the money leaves your account is this one, how much ai you get per dollar you spend. so i took each box's token generation speed and divided it by what the box costs.
so here is tokens per dollar, the token-gen speed each box gives you for every $1,000 it costs:
>nvidia dgx spark, 128gb, $4,699 → 12.5
>amd strix halo, 128gb, the one i benched, $3,449 → 15.5
>amd strix halo, same chip in a 64gb box, $1,959 → 27.3
all three are tok/s for every $1,000 you spend, higher means more ai for your money.
now look at the bottom line. the same amd chip in the cheaper 64gb box gives you more than double the inference per dollar of the spark, and it runs this exact model at the same speed, because on these chips speed comes from memory bandwidth not capacity and the bandwidth is identical.
that is not a rounding error, that is the whole buying decision sitting right there.
here is why it happens, because this is the part that makes it real instead of a price whine. the speed you actually feel, the model typing its answer back to you, is decided by memory bandwidth, not raw compute.
the chip has to pull the model's weights out of memory once for every token it writes. both boxes have nearly the same bandwidth, about 256 against 273 gigabytes a second, so they write at nearly the same speed.
so what does nvidia's extra 3x of price buy you? compute. the blackwell chip has a lot more raw math, which is exactly why it was 2x faster at reading your input in the last post. and that is real.
but reading your input happens once. writing the answer happens for every token, all day long, and that is the part bandwidth owns, and the bandwidth is basically tied.
to be fair to the expensive box, because the silicon decides this, not my wallet.
if your work is huge context and heavy document crunching, that 2x prefill speed genuinely earns its keep. cuda is also years more mature than rocm, which the price tag never shows you but you feel the first time something breaks. and the spark has high-speed networking built in to link two of them into one bigger machine, the strix has no such ports at all, so if your plan is to chain boxes together the spark is made for it and the amd box simply is not.
for most people running a chat or an agent loop on a single box though, you are paying triple for muscle you will almost never flex.
one honest caveat so nobody can swing on it, the spark's price includes a 4tb drive against the strix's 1tb, so part of that gap is storage, not silicon. it tightens the math a little. it does not close it.
the spec sheet leads with speed because speed sounds expensive and impressive. the buyer math is quieter, and it points the other way.
the accessible tier of local ai is further along than the timeline thinks, and it costs a lot less than they keep telling you.
the results are in. two 128gb boxes on my desk, the nvidia dgx spark and the amd strix halo.
everyone argues which one is faster for local ai off spec sheets and vibes, so i stopped guessing and ran them head to head on the exact same model. here is what i actually found.
the setup, because it only counts if it is fair. the identical model file, the same Qwen3.6-35B-A3B at Q8, byte for byte the same gguf on both boxes. same llama.cpp commit. same flags. both boxes fully idle, nothing else touching the gpu. no thumb on the scale either way.
the two boxes:
>nvidia dgx spark, GB10, 128gb unified, 4tb samsung nvme, $4,699
>amd strix halo, ryzen ai max 395, 128gb unified, 1tb wd black, mine is the framework desktop at $3,449
prompt processing, how fast it reads your input:
>spark 1957 tok/s
>strix 956 tok/s
the spark is a clean 2x faster here. this is nvidia's compute muscle showing, long context and big documents go down fast.
token generation, how fast it writes the answer back, the speed you actually feel:
>spark 58.6 tok/s
>strix 53.5 tok/s
spark still wins, but by about 10 percent. side by side you would barely clock the difference while it types.
so on raw speed nvidia takes it, decisively on prompt processing, narrowly on generation. no spin, the spark is the faster box.
but speed is only half the question. the other half is what you paid to get it, and that one does not go the way this one did. coming next.