CyberSecureIPS

CyberSecureIPS

Minhjason ₍ᐢ-ₓ-ᐢ₎

D'Pope🌹

Fouad

Vishranth Suresh

Daniel Montgomery, PhD

Stephanie

Eugene Ng

Eugene Ng

@EugeneNg

23 May 2024

1 | Super strong quarter driven by Data Center, all customer types, led by enterprise, CSIPs, large cloud providers Q1 was another record quarter…Strong sequential Data Center growth was driven by all customer types, led by enterprise and consumer Internet companies. Large cloud providers continue to drive strong growth as they deploy and ramp NVIDIA AI infrastructure at scale and represented the mid-40s as a percentage of our Data Center revenue. 2 | Driven by Hopper GPU and H100s From a product perspective, the vast majority of compute revenue was driven by our Hopper GPU architecture. Demand for Hopper during the quarter continues to increase. Thanks to CUDA algorithm innovations, we've been able to accelerate LLM inference on H100 by up to 3x, which can translate to a 3x cost reduction for serving popular models like Llama 3. 3 | H200 started sampling in Q1, currently in production and shipments in Q2, first H200 delivered to OpenAI last week We started sampling the H200 in Q1 and are currently in production with shipments on track for Q2. The first H200 system was delivered by Jensen to Sam Altman and the team at OpenAI and powered their amazing GPT-4o demos last week. ¸4 | Currently cloud providers can earn $5 in GPU instant hosting revenue above 4 years Training and inferencing AI on NVIDIA CUDA is driving meaningful acceleration in cloud rental revenue growth, delivering an immediate and strong return on cloud providers' investment. For every $1 spent on NVIDIA AI infrastructure, cloud providers have an opportunity to earn $5 in GPU instant hosting revenue over 4 years. 5 | H200 is 2X the inference of H100, API providers can generate $7 in revenue over 4 years for spent H200 nearly doubles the inference performance of H100, delivering significant value for production deployments. For example, using Llama 3 with 700 billion parameters, a single NVIDIA HGX H200 server can deliver 24,000 tokens per second, supporting more than 2,400 users at the same time. That means for every $1 spent on NVIDIA HGX H200 servers at current prices per token, an API provider serving Llama 3 tokens can generate $7 in revenue over 4 years. 6 | Supply for H100 grew, still constrained on H200, Blackwell in full production. Expect demand to exceed supply well into next year While supply for H100 grew, we are still constrained on H200. At the same time, Blackwell is in full production. We are working to bring up our system and cloud partners for global availability later this year. Demand for H200 and Blackwell is well ahead of supply, and we expect demand may exceed supply well into next year. 7 | Blackwell launched during GTC March, 4X faster training and 30X faster inference than H100 (i.e. 15X better than H200) At GTC in March, we launched our next-generation AI factory platform, Blackwell. The Blackwell GPU architecture delivers up to 4x faster training and 30x faster inference than the H100 and enables real-time generative AI on trillion-parameter large language models. Blackwell is a giant leap with up to 25x lower TCO and energy consumption than Hopper. The Blackwell platform includes the fifth-generation NVLink with a multi-GPU spine and new InfiniBand and Ethernet switches, the X800 series designed for a trillion-parameter scale AI. Blackwell is designed to support data centers universally, from hyperscale to enterprise, training to inference, x86 to Grace CPUs, Ethernet to InfiniBand networking, and air cooling to liquid cooling. 8 | Blackwell available in over 100 OEMs and ODM systems at launch allowing for fast and broad adoption Blackwell will be available in over 100 OEM and ODM systems at launch, more than double the number of Hoppers launched and representing every major computer maker in the world. This will support fast and broad adoption across the customer types, workloads, and data center environments in the first year shipments. Blackwell time-to-market customers include Amazon, Google, Meta, Microsoft, OpenAI, Oracle, Tesla, and XAi. 9 | Blackwell will be available via 100 different system configurations through ecosystem partners and this is off the charts, which means each is custom, and each is not exactly commoditized Yes. I appreciate that. In fact, the way we sell GB200 is the same. We disaggregate all of the components that make sense and we integrate it into computer makers. We have 100 different computer system configurations that are coming this year for Blackwell. And that is off the charts. Hopper, frankly, had only half, but that's at its peak. It started out with way less than that even. And so you're going to see liquid-cooled version, air-cooled version, x86 versions, Grace versions, so on and so forth. There's a whole bunch of systems that are being designed. And they're offered from all of our ecosystem of great partners. Nothing has really changed. 10 | Customers will easily between H100, H200 and B100 Blackwells, designed to be backwards compatible They will easily transition from H100 to H200 to B100. And so Blackwell systems have been designed to be backwards compatible, if you will, electrically, mechanically. And of course, the software stack that runs on Hopper will run fantastically on Blackwell. 11 | Blackwell shipping in Q2 onwards and ramp in Q3 and see a lot of revenue in 2024 We will be shipping -- well, we've been in production for a little bit of time. But our production shipments will start in Q2 and ramp in Q3, and customers should have data centers stood up in Q4…We will see a lot of Blackwell revenue this year. 12 | NVIDIA has been preparing the ecosystem for liquid cooling for some time We also have been priming the pump, if you will, with the entire ecosystem, getting them ready for liquid cooling. We've been talking to the ecosystem about Blackwell for quite some time. And the CSPs, the data centers, the ODMs, the system makers, our supply chain; beyond them, the cooling supply chain base, liquid cooling supply chain base, data center supply chain base, no one is going to be surprised with Blackwell coming and the capabilities that we would like to deliver with Grace Blackwell 200. GB200 is going to be exceptional. 13 | NVIDIA offers the best time-to train, lower cost to train and inference LLMs For cloud rental customers, NVIDIA GPUs offer the best time-to-train models, the lowest cost to train models and the lowest cost to inference large language models. 14 | That’s why many leading LLMs are all building on NVIDIA AI Leading LLM companies such as OpenAI, Adept, Anthropic, Character.ai, Cohere, Databricks, DeepMind, Meta, Mistral, XAi, and many others are building on NVIDIA AI in the cloud. 15 | NVIDIA’s GPUs are all about versatility, not general purpose, its great for running things where they are a lot of commonalities, if wasn’t too specific, might as well build an FGPA or ASIC NVIDIA's accelerated computing is versatile but I wouldn't call it general purpose. Like, for example, we wouldn't be very good at running the spreadsheet. That was really designed for general-purpose computing. And so the control loop of an operating system code probably isn't fantastic for general-purpose computing, not for accelerated computing. And so I would say that we're versatile, and that's usually the way I describe it. There's a rich domain of applications that we're able to accelerate over the years, but they all have a lot of commonalities, maybe some deep differences, but commonalities. They're all things that I can run in parallel, they're all heavily threaded. 5% of the code represents 99% of the run-time, And so the versatility of our platform is really quite key. And if you're too brittle and too specific, you might as well just build an FPGA or you build an ASIC or something like that, but that's hardly a computer. 16 | NVIDIA H100 GPUs paved the way on the hardware side for Tesla’s breakthrough performance in FSD 12 Enterprises drove strong sequential growth in Data Center this quarter. We supported Tesla's expansion of their training AI cluster to 35,000 H100 GPUs. Their use of NVIDIA AI infrastructure paved the way for the breakthrough performance of FSD version 12, their latest autonomous driving software based on Vision. 17 | Expecting automotive to be largest enterprise vertical within data center this year NVIDIA Transformers, while consuming significantly more computing, are enabling dramatically better autonomous driving capabilities and propelling significant growth for NVIDIA AI infrastructure across the automotive industry. We expect automotive to be our largest enterprise vertical within Data Center this year, driving a multibillion revenue opportunity across on-prem and cloud consumption. 18 | Customers now expanding to Sovereign AI, examples like Japan’s KDDI, Sakura Internet, Softbank, France’s Scaleway, Italy’s Swisscom, Singapore’s NSC and Singtel From a geographic perspective, Data Center revenue continues to diversify as countries around the world invest in sovereign AI. Sovereign AI refers to a nation's capabilities to produce artificial intelligence using its own infrastructure, data, workforce, and business networks. For example, Japan plans to invest more than $740 million in key digital infrastructure providers, including KDDI, Sakura Internet, and SoftBank to build out the nation's sovereign AI infrastructure. France-based Scaleway, a subsidiary of the Iliad Group, is building Europe's most powerful cloud native AI supercomputer. In Italy, Swisscom Group will build the nation's first and most powerful NVIDIA DGX-powered supercomputer to develop the first LLM natively trained in the Italian language. And in Singapore, the National Supercomputer Centre is getting upgraded with NVIDIA Hopper GPUs, while Singtel is building NVIDIA's accelerated AI factories across Southeast Asia. 19 | Industry is going through a major change with the next Industrial Revolution to accelerated computing and building the new type of data center The industry is going through a major change. Before we start Q&A, let me give you some perspective on the importance of the transformation. The next industrial revolution has begun. Companies and countries are partnering with NVIDIA to shift the trillion-dollar installed base of traditional data centers to accelerated computing and build a new type of data center, AI factories, to produce a new commodity, artificial intelligence. AI will bring significant productivity gains to nearly every industry and help companies be more cost and energy efficient while expanding revenue opportunities. 20 | Models are becoming multimodal, moving from information retrieval to one of answers and skills generation from general purpose CPU to GPU accelerated computing Training continues to scale as models learn to be multimodal, understanding text, speech, images, video, and 3D and learn to reason and plan. Our inference workloads are growing incredibly. With generative AI, inference, which is now about fast token generation at massive scale, has become incredibly complex. Generative AI is driving a from-foundation-up full stack computing platform shift that will transform every computer interaction. From today's information retrieval model, we are shifting to an answers and skills generation model of computing. AI will understand context and our intentions, be knowledgeable, reason, plan and perform tasks. We are fundamentally changing how computing works and what computers can do, from general purpose CPU to GPU accelerated computing, from instruction driven software to intention-understanding models, from retrieving information to performing skills and, at the industrial level, from producing software to generating tokens, manufacturing digital intelligence. 21 | It is no longer instruction driven, but intention understanding, no longer retrieving pre-recorded files but generating contextually relevant intelligent answers Longer term, we're completely redesigning how computers work. And this is a platform shift. Of course, it's been compared to other platform shifts in the past. But time will clearly tell that this is much, much more profound than previous platform shifts. And the reason for that is because the computer is no longer an instruction-driven only computer. It's an intention-understanding computer. And it understands, of course, the way we interact with it, but it also understands our meaning, what we intend that we asked it to do. And it has the ability to reason, inference iteratively to process a plan and come back with a solution. And so every aspect of the computer is changing in such a way that instead of retrieving prerecorded files, it is now generating contextually relevant intelligent answers. And so that's going to change computing stacks all over the world. 22 | Generation has changed…not longer trying to detect the cat, but generating every pixel of a cat …inference has really fundamentally changed, it's now generation. It's not trying to just detect the cat, which was plenty hard in itself, but it has to generate every pixel of a cat. And so the generation process is a fundamentally different processing architecture. 23 | Expanded from CSPs to CIPs, Enterprise, Sovereign AI, Auto and Healthcare customers CSPs were the first generative AI movers. With NVIDIA, CSPs accelerated workloads to save money and power. The tokens generated by NVIDIA Hopper drive revenues for their AI services. And NVIDIA cloud instances attract rental customers from our rich ecosystem of developers. Beyond cloud service providers, generative AI has expanded to consumer Internet companies and enterprise, sovereign AI, automotive, and health care customers, creating multiple multibilliondollar vertical markets. Strong and accelerating demand for generative AI training and inference on the Hopper platform propels our Data Center growth. Training continues to scale as models learn to be multimodal, understanding text, speech, images, video, and 3D and learn to reason and plan. Our inference workloads are growing incredibly. With generative AI, inference, which is now about fast token generation at massive scale, has become incredibly complex. Generative AI is driving a from-foundation-up full stack computing platform shift that will transform every computer interaction. From today's information retrieval model, we are shifting to an answers and skills generation model of computing. AI will understand context and our intentions, be knowledgeable, reason, plan and perform tasks. We are fundamentally changing how computing works and what computers can do, from general purpose CPU to GPU accelerated computing, from instruction 24 | Inference vs Training, 40% vs 60% of data Center revenues In our trailing 4 quarters, we estimate that inference drove about 40% of our Data Center revenue. Both training and inference are growing significantly. 25 | Have ramped new products designed for China that don’t require export control license We ramped new products designed specifically for China that don't require export control license. Our Data Center revenue in China is down significantly from the level prior to the imposition of the new export control restrictions in October. We expect the market in China to remain very competitive going forward. 26 | NVIDIA is aware of all the bottlenecks, not guessing it We know where all the bottlenecks are. We're not guessing about it. But we know how it's going to perform and we know where the bottlenecks are. We know where we need to optimize with them, and we know where we have to help them improve their infrastructure to achieve the most performance. This deep intimate knowledge at the entire data center scale is fundamentally what sets us apart today. We build every single chip from the ground up. We know exactly how processing is done across the entire system. And so we understand exactly how it's going to perform and how to get the most out of it with every single generation. 27 | Memory is becoming important because it allows for LLMs to carry on a conversation and understand the content (more interesting if this shifts from the cloud to our personal devices) now all of a sudden, look at this, large language models with memory because the large language model needs to have memory so they can carry on a conversation with you, understand the context. All of a sudden, the versatility of the Grace memory became super important. And so each one of these advances in generative AI and the advancement of AI really begs for not having a widget that's designed for one model but to have something that is really good for this entire domain, properties of this entire domain, but obeys the first principles of software: that software is going to continue to evolve, that software is going to keep getting better and bigger. We believe in the scaling of these models. 28 | The edge is that NVIDIA can create their own architecture with the entire system now, memory system between Grace and Hopper is fast and saves power Grace allows us to do something that isn't possible with the configuration, the system configuration today. The memory system between Grace and Hopper are coherent and connected. The interconnect between the 2 chips -- calling it 2 chips is almost weird because it's like a superchip. The two of them are connected with this interface that's like at terabytes per second. It's off the charts. And the memory that's used by Grace is LPDDR. It's the first data center-grade low-power memory. And so we save a lot of power on every single node. And then finally, because of the architecture, because we can create our own architecture with the entire system now, we could create something that has a really large NVLink domain, which is vitally important to the next-generation large language models for inferencing. that GB200 has a 72-node NVLink domain. That's like 72 Blackwells connected together into 1 giant GPU. And so we needed Grace Blackwells to be able to do that. And so there are architectural reasons, there are software programming reasons and then there are system reasons that are essential for us to build them that way. 29 | Strong networking driven by InfiniBand Strong networking YoY growth was driven by InfiniBand. We experienced a modest sequential decline, which was largely due to the timing of supply, with demand well ahead of what we were able to ship. We expect networking to return to sequential growth in Q2. 30 | Announced NVIDIA Inference Microservices (NIM) containers as part of the NVIDIA AI enterprise software platform We announced a new software product with the introduction of NVIDIA Inference Microservices, or NIM. NIM provides secure and performance-optimized containers powered by NVIDIA CUDA acceleration in network computing and inference software, including Triton and PrintServer and TensorRT-LLM with industry-standard APIs for a broad range of use cases, including large language models for text, speech, imaging, vision, robotics, genomics, and digital biology. They enable developers to quickly build and deploy generative AI applications using leading models from NVIDIA, AI21, Adept, Cohere, Getty Images, and Shutterstock, and open models from Google, Hugging Face, Meta, Microsoft, Mistral AI, Snowflake and Stability AI. NIMs will be offered as part of our NVIDIA AI enterprise software platform for production deployment in the cloud or on-prem. 31 | On utilization and demand for GPUs, NVIDIA continues to race every day…demand far outstrips supply The demand for GPUs in all the data centers is incredible. We're racing every single day. We're racing actually. Customers are putting a lot of pressure on us to deliver the systems and stand those up as quickly as possible. And of course, I haven't even mentioned all of the sovereign AIs who would like to train all of their regional natural resource of their country, which is their data, to train their regional models. And there's a lot of pressure to stand those systems up. So anyhow, the demand, I think, is really, really high and it outstrips our supply. 32 | Transition from H100 to H200 then to Blackwell B100 We see increasing demand of Hopper through this quarter. And we expect demand to outstrip supply for some time as we now transition to H200, as we transition to Blackwell. Everybody is anxious to get their infrastructure online. 33 | Being able to invest and going for the best, allows the strongest to have the highest performance and the lowest costs with economies of scale And this is the reason why today, performance matters in everything. This is at a time when the highest performance is also the lowest cost because the infrastructure cost of carrying all of these chips cost a lot of money. And it takes a lot of money to fund the data center, to operate the data center, the people that goes along with it, the power that goes along with it, the real estate that goes along with it, and all of it adds up. And so the highest performance is also the lowest TCO. 34 | That’s why the first-mover-advantage and being able to extract value and continually invest allows the top to stay at the top, that’s why tech leadership Let me give you an example of time being really valuable, why this idea of standing up a data center instantaneously is so valuable and getting this thing called time-to-train is so valuable. The reason for that is because the next company who reaches the next major plateau gets to announce a groundbreaking AI. And the second one after that gets to announce something that's 0.3% better. And so the question is, do you want to be repeatedly the company delivering groundbreaking AI or the company delivering 0.3% better? And that's the reason why this race, as in all technology races, the race is so important. And you're seeing this race across multiple companies because this is so vital to have technology leadership, for companies to trust the leadership that want to build on your platform and know that the platform that they're building on is going to get better and better. And so leadership matters a great deal. Time-to-train matters a great deal. 35 | While others are playing catch up to NVIDIA, NVIDIA is already planning 1Y ahead with a new NVIDIA GPU chips, other Blackwells…and AI infrastructure buildout is going to be multi-year Well, I can announce that after Blackwell, there's another chip. And we are on a 1-year rhythm…New CPUs, new GPUs, new networking NICs, new switches, a mound of chips that are coming. And then after Blackwell, as you mentioned, we have other Blackwells coming. And then there's a short -- we're in a 1-year rhythm as we've explained to the world. And we want our customers to see our road map for as far as they like, but they're early in their build-out anyways and so they had to just keep on building, okay? And so there's going to be a whole bunch of chips coming at them, and they just got to keep on building and just, if you will, performance-average your way into it. Token generation will drive a multiyear build-out of AI factories. 36 | New networking technology is coming, Spectrum-X for Ethernet, taking ALL 3 NVLINK, InfiniBand and Ethernet networking fabric all at a fast clip ou can also count on us having new networking technology on a very fast rhythm. We're announcing Spectrum-X for Ethernet. But we're all in on Ethernet, and we have a really exciting road map coming for Ethernet. We have a rich ecosystem of partners. Dell announced that they're taking Spectrum-X to market. We have a rich ecosystem of customers and partners who are going to announce taking our entire AI factory architecture to market. And so for companies that want the ultimate performance, we have InfiniBand computing fabric. InfiniBand is a computing fabric, Ethernet to network. And InfiniBand, over the years, started out as a computing fabric, became a better and better network. Ethernet is a network and with Spectrum-X, we're going to make it a much better computing fabric. And we're committed, fully committed, to all 3 links, NVLink computing fabric for single computing domain, to InfiniBand computing fabric, to Ethernet networking computing fabric. And so we're going to take all 3 of them forward at a very fast clip. And so you're going to see new switches coming, new NICs coming, new capability, new software stacks that run on all 3 of them.

1,309