NVIDIA inference team:
"Training was never the hard part. Serving a model too big for a single GPU in production, at scale, without killing latency is."
So how do you actually run LLM inference across hundreds of GPUs without your latency (or your cloud bill) blowing up?
In one technical session, NVIDIA, Gcore & Orange break down exactly that using NVIDIA Dynamo, their open-source framework for distributed multi-node inference.
What's inside:
•Dynamo's architecture for scaling inference to hundreds of GPUs
•Real production deployments (Gcore Orange)
•Cutting latency and cost at the same time
Why agentic workloads open models (DeepSeek, Kimi, GLM, Mistral) are eating inference
Worth more than a $1,000 MLOps course.
Watch the webinar 👇