One pattern we keep seeing with customers serving LLMs at scale:
Prefill-decode disaggregation is often treated like a magic wand. But the reality is more nuanced.
So we wrote down the core insights for when PD helps, when it does not, and validated them on AMD vLLM — where the PD path has been much less paved. 🧵