LISP4fun

LISP4fun

Mulight 沐光🌟

Mulight 沐光🌟

Yuriy Yarosh

BowTiedIguana | Decentralized Technology & Finance

Chandan

Alisson Ryan | AI Automation & Data Marketing

Ayush

Ayush

TheValueist

TheValueist

@TheValueist

22 Nov 2025

$NVDA $AMZN $MSFT $GOOGL Below is a detailed analysis of the hyperscalers and neoclouds’ various levels of service offerings. You will see the heterogeneity of the service offerings, which tells a different story than the homogeneity of “GPU compute” that Wall Street and MSM promote. ———— • Traditional hyperscalers (Azure, AWS, Google Cloud, Oracle, IBM, Alibaba) span almost the entire hierarchy: from GPU‑backed SaaS copilots (L1) and model APIs (L2) down through managed training platforms (L3–L4), managed GPU clusters (L5), GPU VMs (L6) and, in some cases, bare‑metal GPU instances (L7). • Neocloud giants (CoreWeave, Lambda, Crusoe, Nebius) concentrate in the middle and lower layers: strong managed clusters and orchestration (L4–L5) and high‑quality VMs / bare metal (L6–L7), with a growing but more selective presence in training platforms and inference services (L2–L3). • Platform‑centric neoclouds (Together, RunPod, Verda/DataCrunch, Gcore, Hyperstack, Firmus, GMO GPU Cloud, TensorWave and a few others) extend the stack upward into managed training, inference, and lightweight MLOps (L2–L5) while still selling GPU VMs and clusters (L6–L7). • Infrastructure‑first neoclouds (Voltage Park, Cirrascale, Scaleway, Vultr, many Bronze‑tier clouds) are tightly focused on high‑performance clusters and bare metal (L5–L7), with minimal higher‑level platform features. • Marketplaces and brokers (Vast.ai, Prime Intellect, TensorDock within Voltage Park, etc.) operate as an overlay on the bottom of the stack, aggregating GPU VMs and bare‑metal servers from many providers rather than running their own full cloud. They sit between L6–L7 commercially but do not own much of the technical stack. To anchor terminology, the hierarchy is: L1 – Fully managed AI applications (copilots, vertical SaaS). L2 – Managed foundation models and inference APIs (model‑as‑a‑service). L3 – Managed training / fine‑tuning platforms (“bring data, get a model”). L4 – ML / MLOps platforms with integrated GPU orchestration (notebooks, pipelines, feature stores). L5 – Managed GPU clusters (Kubernetes/Ray/Slurm as‑a‑service). L6 – GPU‑accelerated VM instances. L7 – Bare‑metal GPU‑as‑a‑service (dedicated servers). Below L7 sit colocation and on‑premises deployments, which are important for end users but usually not branded as “GPU cloud providers.” ⸻ 1.Traditional hyperscalers ⸻ SemiAnalysis’ ClusterMAX 2.0 classifies Microsoft Azure, Oracle, AWS and Google Cloud as “Traditional Hyperscalers,” distinct from neoclouds and marketplaces. All four operate across essentially the full stack. Microsoft Azure • L1 – Fully managed AI apps: Microsoft 365 Copilot, GitHub Copilot and industry copilots are SaaS products that abstract GPUs entirely. • L2 – Model APIs: Azure OpenAI Service and Azure AI model catalog expose GPT‑class and other models via API with token‑based billing. • L3 – Training / fine‑tuning: Azure AI Studio and Azure Machine Learning offer managed training, fine‑tuning and evaluation workflows over GPUs and TPUs. • L4 – ML/MLOps PaaS: Azure ML provides experiment tracking, pipelines, registries, and deployment, tightly integrated with Azure Kubernetes Service. • L5 – Managed GPU clusters: AKS with GPU node pools and Azure’s HPC / AI offerings support large distributed training jobs on H100, H200, GB200 and GB300 clusters (for example, the GB300 NVL72 “supercomputer‑scale” deployment). • L6 – GPU VMs: NC/ND/NV series instances expose single‑ and multi‑GPU VMs based on Nvidia GPUs. • L7 – Bare metal: GPU bare metal is limited; Azure’s focus is on VM‑based HPC. Oracle positions itself as the only major cloud with fully exposed GPU bare metal, suggesting Azure’s GPU bare‑metal SKUs are niche or absent. Amazon Web Services • L1 – AI applications: Amazon Q, CodeWhisperer, and other AI‑backed SaaS features embed GPUs but are sold as enterprise productivity tools. • L2 – Model APIs: Amazon Bedrock exposes FMs from Anthropic, Meta, Mistral, Cohere and others via one API. • L3 – Training / fine‑tuning: Bedrock model customization and Amazon SageMaker training flows provide managed training and fine‑tuning on GPUs including A100, H100 and H200. • L4 – ML/MLOps PaaS: SageMaker is a full ML platform (experiments, pipelines, feature stores, deployment) built over AWS GPU infrastructure. • L5 – Managed GPU clusters: EKS, AWS Batch, ParallelCluster and UltraClusters over P4/P5/P6 instances provide managed Slurm/Kubernetes environments for thousands of GPUs. • L6 – GPU VMs: EC2 P‑ and G‑series instances deliver virtualized GPU capacity. • L7 – Bare metal: Bare‑metal EC2 instance types (“.metal”) and Nitro‑based GPU instances provide near‑metal access for workloads needing direct hardware control. Google Cloud • L1 – AI apps: Workspace AI features and Gemini‑powered assistants are AI SaaS offerings on top of TPUs/GPUs. • L2 – Model APIs: Vertex AI and Gemini APIs expose Google and partner models via managed endpoints. • L3 – Training / fine‑tuning: Vertex AI supports managed training/fine‑tuning on TPUs and GPUs with pre‑built pipelines. • L4 – ML/MLOps PaaS: Vertex AI is a unified platform with pipelines, registries, and MLOps features. • L5 – Managed GPU clusters: GKE with GPU node pools and “AI Hypercomputer” pods combine TPUs, GPUs, fast storage and advanced fabric; large training workloads run here. • L6 – GPU VMs: Compute Engine exposes A100, H100, H200, L4, T4 and others as attachable GPUs to VMs. • L7 – Bare metal: Google offers “Bare Metal Solution” for Oracle and some HPC workloads, but GPU bare metal is niche relative to VM‑based GPUs. The mainstream GPU product is at L6. Oracle Cloud Infrastructure (OCI) • L1 – AI applications: Oracle has vertical SaaS applications, but most GPU‑relevant offerings start at L2. • L2 – Model APIs: OCI Generative AI service exposes managed FMs via API. • L3 – Training / fine‑tuning: OCI Generative AI supports hosting and fine‑tuning models on “dedicated AI clusters.” • L4 – ML/MLOps PaaS: OCI AI services offer some PaaS‑like workflows, though less broad than SageMaker or Vertex. • L5 – Managed clusters: Dedicated AI clusters provide managed environments running on H100/H200/B200 fleets, integrated with OCI networking and storage. • L6 – GPU VMs: Standard OCI GPU VMs. • L7 – Bare metal: OCI explicitly markets bare‑metal GPU instances (H100/H200, MI300X etc.) as a differentiator versus other hyperscalers. IBM Cloud and other “second‑tier” hyperscalers • IBM Cloud offers GPU‑equipped bare‑metal servers and GPU VMs, positioning primarily at L6–L7 with some PaaS around Watson and MLOps. • DigitalOcean, Vultr and similar providers sit mostly at L6 (GPU VMs) with little or no managed clusters or model APIs, though some offer simple PaaS integrations (e.g., DigitalOcean GPU Droplets integrated with its platform). In summary, hyperscalers extend from L1 down to at least L6 and, for OCI and parts of AWS, to L7. They internalize much of the stack, using GPU IaaS mainly as a cost base for higher‑margin L1–L4 services. ⸻ 2.Neocloud giants (ClusterMAX “Neocloud Giants”) ⸻ ClusterMAX identifies four “Neocloud Giants”: CoreWeave, Lambda Labs, Crusoe and Nebius. These firms are the closest neocloud analogues to hyperscalers, but most of their revenue and differentiation is concentrated in L4–L7. CoreWeave (Platinum‑tier in ClusterMAX) • L4 – PaaS: CoreWeave Kubernetes Service is a managed K8s platform with Slurm‑on‑Kubernetes, observability, and AI‑optimized configuration (networking, storage, drivers). • L5 – Managed GPU clusters: CKS clusters run on bare‑metal GPU nodes with NVLink and InfiniBand, oriented to training and inference. • L6 – GPU VMs: CoreWeave exposes GPU‑accelerated VMs for various workloads. • L7 – Bare metal: Purpose‑built bare‑metal infrastructure for inference and training underpins the platform. • Limited L2–L3: CoreWeave focuses on infrastructure and orchestration rather than its own generic model APIs; higher‑level services are mostly partner‑led (e.g., Meta, OpenAI). Lambda Labs (Neocloud giant, Silver medallion) • L3 – Training / fine‑tuning: Lambda Cloud is used explicitly for training and fine‑tuning; workspaces and templates push into L3 territory. • L4 – PaaS: Jupyter‑style workspaces and preconfigured ML environments provide a thin ML platform. • L5 – Managed clusters: Lambda supports multi‑GPU clusters and 1‑click clusters across B200/H100/A100/GH200. • L6 – GPU VMs: Core on‑demand product is GPU VMs with Lambda Stack preinstalled. • L7 – Bare metal: Historically offered bare‑metal DGX and HGX rentals; still present for some dedicated customers. Crusoe • L4 – PaaS: Crusoe positions itself as an “AI platform,” with managed Kubernetes (Crusoe Managed Kubernetes) and orchestration (Run:ai integration) to simplify MLOps. • L5 – Managed clusters: Crusoe AutoClusters automate Slurm/Kubernetes cluster creation over large GPU fleets. • L6 – GPU VMs: Crusoe Cloud offers GPU VMs on Nvidia (A100, H100, H200) and increasingly AMD MI300X. • L7 – Bare metal: Bare‑metal nodes in data centers near stranded energy sources and, more recently, experimental space‑based H100 deployments. Nebius • L4 – PaaS: Nebius AI Cloud provides managed Kubernetes and support for AI frameworks and schedulers. • L5 – Managed GPU clusters: Thousand‑GPU clusters on InfiniBand fabric with Slurm/Kubernetes orchestration. • L6 – GPU VMs: Standard GPU virtual machines for smaller workloads. • L7 – Bare metal: “Bare‑metal performance” HGX/GB systems on InfiniBand underlying the cloud. • Early L3 activity: Nebius markets itself as a platform for building, tuning and running AI models, but does not heavily promote generic model APIs. These giants are infrastructure‑heavy. Their primary differentiation is at L4–L5 (managed orchestration and training‑grade clusters) and L6–L7 (pricing, topology, availability). They have limited or no presence at L1 and only selective presence at L2–L3, usually via partner solutions. ⸻ 3.Platform‑centric neoclouds (ClusterMAX Gold/Silver) ⸻ This group overlays higher‑level services (L2–L4) on top of infrastructure. ClusterMAX’s Gold and Silver tiers include Together, Voltage Park, Gcore, Firmus/SMC, TensorWave, GMO GPU Cloud, RunPod, Verda (DataCrunch), Scaleway, Cirrascale, Vultr and others. Together AI • L2 – Model APIs: Together Inference offers serverless inference for 200 open‑source models with pay‑per‑token pricing and “dedicated endpoints.” • L3 – Training / fine‑tuning: Managed fine‑tuning and full training services on Together’s GPU clusters. • L4 – PaaS: An integrated platform to train, fine‑tune, and deploy models; workflow orchestration is baked in. • L5 – Managed clusters: Slurm and Kubernetes‑based GPU clusters for large‑scale training. • L6–L7 – GPU infrastructure: Instant GPU clusters and reserved GB200/B200/H200/H100 clusters, effectively exposing bare‑metal performance with orchestration. RunPod • L2 – Inference endpoints: Serverless GPU endpoints for generative AI models. • L3 – Training support: Pods and templates for training/fine‑tuning with persistent storage. • L4 – Lightweight PaaS: “End‑to‑end AI cloud” branding, with prebuilt runtimes and deployment automation. • L5 – Managed clusters: Less emphasis on formal cluster services, but multi‑pod orchestrations can approximate L5. • L6–L7 – GPU pods: Both community cloud (marketplace‑like) and secure cloud deliver GPU VMs and some dedicated nodes. Verda (formerly DataCrunch) • L2 – Serverless inference: Autoscaling model hosting with per‑usage pricing. • L3 – Training support: Guidance and tooling; not as deep as SageMaker but moving beyond pure IaaS. • L4 – PaaS: Model hosting, monitoring and autoscaling containers reflect PaaS characteristics. • L5 – GPU clusters: Custom clusters with Verda‑managed software stacks. • L6–L7 – GPU instances: On‑demand and long‑term GPU instances up to 8× B200 HGX. Gcore • L3–L4: Preconfigured environments and containerized workloads with Docker/Kubernetes for AI and ML. • L5: Cluster‑level infrastructure for AI training, built on combinations of bare metal and VMs. • L6–L7: GPU VMs and bare‑metal servers interconnected via fast networks. Hyperstack • L2–L3: “Hyperstack AI Studio” markets an end‑to‑end environment to build, deploy and monitor AI, implying training and inference PaaS. • L4: No‑infra‑needed positioning indicates significant platform services. • L5–L7: Underlying GPU cloud with on‑demand GPUs. Firmus / Sustainable Metal Cloud (SMC) • L4–L5: Firmus AI Cloud emphasizes hybrid/multi‑cloud automation, observability and encrypted InfiniBand networking. • L6–L7: SMC is primarily a bare‑metal GPU cloud (H100, A100, L40S) with large, energy‑efficient clusters. • Some L3 elements: Marketing references “AI Cloud Apps,” indicating some platform‑level services on top of clusters. GMO GPU Cloud • L4–L5: Uses DDN storage and NVIDIA AI Enterprise to provide an “all‑in‑one AI development platform” on top of H200 and other GPUs. • L6–L7: High‑performance multi‑node clusters that also appear in TOP500 rankings. TensorWave • L3–L4: Offers managed inference and PaaS features on top of AMD Instinct MI300X/MI325X clusters. • L5–L7: Focuses on high‑performance bare‑metal AMD clusters for training, with large MI325X installations. These providers occupy L2–L7 in varying depth. Their differentiation versus hyperscalers is price, topology (e.g., large, tightly‑coupled H100/B200 or MI300X clusters), and agility; versus infrastructure‑first neoclouds, their differentiation is upward integration into inference and training platforms. ⸻ 4.Infrastructure‑first neoclouds ⸻ This group is defined by strong hardware and topology and relatively thin higher‑level services. Voltage Park • L5 – Managed clusters: Dedicated HGX H100 clusters (64–4,064 GPUs) with 3.2 Tbps InfiniBand are configured for large training jobs. • L6–L7 – Bare metal and VMs: On‑demand access with simple provisioning, little platform abstraction beyond cluster orchestration. • Limited L2–L4: Historically hardware‑centric, with some movement toward “AI factory” software but far from full ML PaaS. The acquisition of TensorDock adds a marketplace component without significantly lifting the stack. Cirrascale • L5–L7: Bare‑metal, dedicated multi‑GPU compute servers with options for fully managed clusters; focus is on physical servers and HPC‑style inference/training. • Some L3–L4 elements: Managed inference offerings exist but are not as integrated as hyperscaler MLOps platforms. Scaleway • L6 – GPU instances: VM‑based GPU instances (H100, L40S, L4, GH200, P100) for training and rendering. • L5: “Supercomputer” configurations suggest some cluster‑level services, but the primary product is IaaS. • Minimal L2–L4: Some integrations, but no full managed model APIs or training platforms at hyperscaler depth. Vultr • L6 – GPU VMs: Cloud GPU instances with Nvidia and AMD GPUs for AI/ML and graphics workloads. • Limited L4: Preconfigured images and integrations, but not a complete ML platform. • No material L2–L3 at present. Many Bronze‑tier providers in ClusterMAX (e.g., STN, GMI Cloud, Hot Aisle, Atlas Cloud, Buzz HPC, Qubrid) fit this archetype: they offer GPU instances and sometimes basic managed clusters, but limited higher‑layer software. These firms compete primarily on $/GPU‑hour, availability and topology, and are highly exposed to Nvidia pricing and utilization risk. ⸻ 5.Marketplaces, brokers, and “Craigslist for GPUs” ⸻ ClusterMAX separates marketplaces and brokers into their own column (Vast.ai, Prime Intellect, Shadeform, Mithril, etc.), and further identifies “Craigslist for GPUs” (gpulist.ai, gpucompare.com). Vast.ai and similar platforms: • Commercially operate at L6–L7, aggregating GPU VMs and bare‑metal servers from hosts and data centers. • Provide a control plane, pricing/auction logic and some templates, but little in the way of true ML PaaS or model APIs. • Function more as a meta‑layer over infrastructure‑first neoclouds and independent hosts than as full clouds. Voltage Park’s TensorDock acquisition is an example of convergence between a neocloud and a marketplace; the marketplace piece still sits at the bottom of the stack, while Voltage Park’s own clusters remain L5–L7. ⸻ 6.Mapping back to the hierarchy and investment relevance ⸻ Several structural points emerge once providers are mapped to the hierarchy. First, hyperscalers are vertically integrated across almost all layers (L1–L6, sometimes L7). AI SaaS and model APIs (L1–L2) are the primary value capture points, with GPU infrastructure treated as a cost base. Their differentiation is in proprietary models, data, and platform coverage; GPU pricing is a tactical lever. This supports margin resilience even as they cut GPU‑hour prices to compete with neoclouds. Second, neocloud giants and leading platform‑centric neoclouds cluster in L3–L7, with their brand and differentiation anchored in L4–L5 (orchestration quality, training cluster performance) and L6–L7 (pricing, topology). They are closest to the “training factory” narrative. Their exposure to Nvidia product cadence, interconnect costs and data‑center power economics is high; their ability to move up into L2–L3 (model APIs, managed training products) is central to achieving software‑like margins rather than commodity infrastructure returns. Third, infrastructure‑first neoclouds and marketplaces are concentrated at L5–L7 with thin software layers. Their economics depend almost entirely on maintaining a spread between GPU‑hour revenue and hardware plus power costs, at high utilization. As ClusterMAX and other research show, this segment is already experiencing strong price competition and tightening returns. Fourth, the taxonomy highlights how Nvidia’s and AMD’s product cycles propagate through the ecosystem. New architectures (H200, B200/GB200, MI325X/MI350X) are surfaced first at infrastructure layers (L6–L7) by hyperscalers and neoclouds, then pulled up into managed clusters (L5) and training platforms (L3–L4), and only later appear indirectly in SaaS and APIs (L1–L2). Providers concentrated at the bottom of the stack bear the full brunt of depreciation and price erosion between generations; those at the top abstract this away behind price per token or per seat. Finally, mapping providers to this hierarchy allows clearer differentiation within neoclouds. CoreWeave, Nebius, Crusoe, Lambda and Together operate much closer to hyperscaler‑like PaaS and cluster services than many emerging neoclouds, while Bronze‑tier and marketplace providers are essentially commodity GPU wholesalers. The former group can plausibly build durable franchises if they secure long‑term offtake agreements and continue moving up the stack; the latter are structurally exposed to consolidation and margin compression as GPU markets normalize. Understanding which layers each provider actually occupies, as opposed to what is claimed in marketing, is therefore critical to assessing durability of economics, the degree of competitive insulation from hyperscalers, and the sensitivity of each business to Nvidia’s pricing and product roadmap.

Rent GPUs | Vast.ai

Rent high-performance cloud GPUs at low cost with Vast.ai. Instantly deploy GPU rentals for AI, machine learning, deep learning, and rendering. Flexible pricing, fast setup, and global availability

vast.ai

TheValueist

@TheValueist

22 Nov 2025

$NVDA Understanding the hierarchy of GPU service offerings is critical because profit pools, competitive dynamics, and risk profiles differ sharply at each layer of the stack. At the top, fully managed AI applications and model APIs capture value through software, data, and distribution, with GPUs as a largely invisible input. Margins are high, pricing power is strongest, and sensitivity to raw GPU-hour pricing is indirect. At the bottom, bare-metal GPU services, GPU-accelerated VMs, and colocation sit closest to Nvidia hardware economics. These businesses are capital intensive, highly exposed to GPU ASPs and utilization, and face intense price competition as more capacity comes online. A clear taxonomy makes it possible to map each company’s true position in this stack, separate narrative from reality, and identify where economic rents are likely to accrue. •Fully managed AI applications (SaaS) that hide GPUs entirely •Managed foundation models and inference APIs (model-as-a-service) •Managed training / fine-tuning platforms (“train a model, no infra”) •Full ML / MLOps platforms with integrated GPU orchestration (PaaS) •Managed GPU clusters (Kubernetes/Ray/Slurm as-a-service) •GPU-accelerated VM instances (IaaS VMs with attached GPUs) •Bare-metal GPU-as-a-service (dedicated GPU servers delivered via API) •GPU colocation / hosted racks (customer-owned GPUs in provider facilities) •On-premise self-managed GPU infrastructure (customer buys and runs everything) The hierarchy above is ordered from the most abstracted, feature-rich, and “application-level” offerings at the top toward the most primitive, infrastructure-level offerings at the bottom. As one moves down the list, the provider delivers fewer higher-level services and the customer assumes more responsibility for software, orchestration, and operations. The underlying GPU product (H100, H200, B200, etc.) can appear at multiple layers; what changes is how much of the stack the provider bundles. 1.Fully managed AI applications (SaaS) that hide GPUs entirely This category includes end-user products that embed GPUs but present themselves as business applications: copilots inside productivity suites, AI assistants in CRM, code copilots, AI search, AI-native office suites, and vertical AI tools (design, drug discovery, industrial simulation) that expose a pure SaaS interface. The customer never sees GPUs, instances, or clusters; pricing is per seat, per document, per project, or per usage metric (tokens, queries, tasks) but packaged as an application. From a GPU standpoint, this layer is the furthest from the hardware. Providers hedge GPU capacity commitments against user growth, drive utilization via multi-tenancy, and optimize model architectures and inference stacks internally. Economically, this tier carries the highest margins and the greatest pricing power, since value is anchored in business outcomes rather than compute units. For infrastructure investors, GPU pricing enters only indirectly, through its effect on gross margin and the provider’s ability to subsidize AI features to drive user growth. 2.Managed foundation models and inference APIs (model-as-a-service) The next layer is model APIs that expose generic or specialized models via HTTP or gRPC endpoints: text LLMs, vision models, multi-modal, embedding APIs, and sometimes domain-specific models. Customers pay per token, per 1k images, or per unit of model usage. The provider manages model hosting, scaling, versioning, A/B testing, safety, and often some monitoring and logging. At this tier, GPUs are abstracted into a “token” or “inference” unit. Providers still actively manage GPU fleets, but customers think in terms of model latency, throughput, and cost per token. GPU choice (H100 vs L40S vs B200) is a provider decision, often invisible or only exposed as coarse tiers (standard vs “turbo” / “premium”). Economics are more sensitive to GPU pricing than at the SaaS layer but still benefit from aggregation: high utilization, batching, and kernel-level optimization can produce meaningful arbitrage between GPU-hour input costs and token-level output pricing. Competitive pressure in inference APIs is compressing margins, but differentiated models and ecosystem lock-in still offer room for attractive returns. 3.Managed training / fine-tuning platforms (“train a model, no infra”) This layer offers “bring your data, get a model” services. Customers upload datasets and configuration; the platform orchestrates the entire training pipeline: data preprocessing, sharding, distributed training, checkpointing, evaluation, and sometimes deployment. Examples include managed fine-tuning products, AutoML services, and dedicated training PaaS that hide cluster-level complexity. Conceptually, this tier is training-as-a-service (TaaS). The platform decides which GPU type, cluster size, parallelism strategy, and fabric to use, possibly across multiple cloud providers. Customers pay per training job, per GPU-hour with automated scaling, or per project. The provider must understand Nvidia’s product roadmaps (A100 vs H100 vs H200 vs B200), topology (SXM vs PCIe, NVLink vs Ethernet/InfiniBand), and optimizer/parallelism choices to minimize training cost while meeting SLAs. This tier is operationally complex and capital intensive because it must handle long-duration jobs, complex failure modes, and heavy state (checkpoints). It sits closer to raw GPU economics than inference APIs because training jobs often fully occupy clusters for days or weeks. The ability to arbitrage between cloud GPU pricing options, exploit spot capacity, and route workloads to neoclouds or hyperscalers efficiently is a key differentiator. Providers at this tier effectively “compile” customer workloads to underlying GPU markets. 4.Full ML / MLOps platforms with integrated GPU orchestration (PaaS) These are general-purpose ML platforms (e.g., managed notebooks, pipelines, experiment tracking, feature stores) that tightly integrate GPU scheduling and orchestration. The core product is not “API access to a model” but a managed development and deployment environment. GPUs are a resource managed by the platform’s scheduler, exposed via abstractions such as “GPU pool,” “compute profile,” or “accelerator type.” Customers typically pay for a mix of platform fees and underlying compute/storage usage. The platform may support multiple workloads: data preprocessing, training, hyperparameter tuning, batch inference, and interactive experimentation. The provider’s responsibilities include user management, security, observability, and lifecycle management, in addition to GPU provisioning. This tier straddles PaaS and IaaS; it commonly runs on top of hyperscaler GPU instances, neocloud GPU clusters, or a hybrid of both. From a GPU-market standpoint, this layer is where many enterprises “terminate” their abstraction. They want to select GPU families and sizes, but not manage Kubernetes, Slurm, or NCCL tuning. The platform provider’s value comes from masking heterogeneity in GPU hardware and pricing across clouds and exposing a stable developer experience. Economically, margins depend on the degree of lock-in and differentiated workflow tools. GPU cost is a major line item but often passed through with markup. 5.Managed GPU clusters (Kubernetes/Ray/Slurm as-a-service) This layer exposes GPU clusters more directly but with managed control planes and orchestration frameworks: managed Kubernetes with GPU node pools, managed Ray clusters, Slurm-as-a-service, or specialized orchestration for distributed training. Customers see nodes, pods, and jobs, and they deploy their own containers or training code. The provider operates the cluster, control plane, autoscaling, and sometimes the storage and network fabric. At this tier, customers are responsible for their own software stack (frameworks, libraries, distributed training logic) and performance tuning but can offload cluster provisioning, lifecycle management, upgrades, and failure handling. GPU choice, topology, and interconnect details may be selectable (e.g., “H100 NVLink InfiniBand cluster”), but the platform still abstracts away low-level host management. Economically, this layer tends to be lower-margin than model APIs or training PaaS because it is closer to raw infrastructure. Differentiation comes from quality of implementation (latency, NCCL efficiency, topology guarantees), reliability, and ease of integration with customer CI/CD. From a pricing perspective, this is usually billed per GPU-hour plus possible control-plane or support fees, with customers closer to seeing the “true” price of H100/H200/B200. 6.GPU-accelerated VM instances (IaaS VMs with attached GPUs) Here, the main product is virtual machines with attached GPUs. The provider offers SKUs like “8× H100 80 GB” or “1× L4,” with documented vCPU, memory, and network bandwidth. Customers manage the OS, drivers, frameworks, and application stack. This is the classic hyperscaler model (EC2, Compute Engine, Azure VMs) and is also exposed by many neoclouds. The abstraction is relatively thin: the provider virtualizes hardware and networking, offers images and basic monitoring, and enforces quotas and security isolation. Customers choose GPU type, count, and region; they manage scaling via autoscaling groups, scripts, or external orchestrators. Pricing is nearly directly indexed to GPU-hour cost, plus a premium for VM overhead and network/storage. This tier is the baseline reference for H100/H200/B200/A100 pricing comparisons and is where price competition is most visible. From a responsibility perspective, customers are responsible for cluster-wide concerns: placement, data locality, job scheduling, checkpointing, and resilience. The provider’s main levers are price, availability, and performance guarantees. Margins depend heavily on utilization and on the provider’s ability to secure favorable GPU procurement terms. 7.Bare-metal GPU-as-a-service (dedicated GPU servers via API) Bare-metal GPU-as-a-service offers physical servers with GPUs and no virtualization. Customers receive full control over the node (BIOS-level in some cases), installing their own OS or bringing custom images. Some providers add an API layer for provisioning and basic lifecycle actions (power on/off, PXE boot, image deploy), but there is no hypervisor. Bare metal is attractive for high-performance workloads, customers wanting custom kernels, or those needing very predictable latency and performance. It also allows providers to avoid hypervisor overhead and simplify some aspects of capacity management. However, customers must handle everything above the bare hardware: OS hardening, clustering, storage, networking topology awareness, and distributed job management. Interconnect (NVLink, InfiniBand vs Ethernet) and node configuration (HGX vs PCIe) are often key differentiators. Pricing is typically per GPU-hour or per-node-hour and often lower than equivalent VM offerings per unit of GPU, because less virtualization overhead is present and the service is more commoditized. For neoclouds, this tier is capital intensive and exposes them most directly to Nvidia GPU ASPs and utilization risk. Unit economics are sensitive to GPU generation pricing, system capex, and rack-level power and cooling costs. 8.GPU colocation / hosted racks (customer-owned GPUs in provider facilities) At this level, the customer owns the GPU servers and the provider supplies data-center services: power, cooling, physical security, network cross-connects, and sometimes remote-hands operations. The provider may offer cage space, managed PDUs, and optional services like monitoring and hardware replacement, but the hardware capex and much of the operational burden reside with the customer. This model is analogous to traditional colocation but focused on GPU-dense racks. It is attractive for entities with large, stable workloads and access to capital, who wish to avoid cloud markups and retain full control over hardware, while leveraging provider expertise in power and cooling. Sovereign deployments and AI majors increasingly use such models for part of their fleet, sometimes operated by colocation specialists or repurposed HPC data centers. In this arrangement, Nvidia and the hardware OEMs capture most of the hardware margin; the colocation provider earns relatively stable, utility-like returns on space and power. The “service” layer around GPUs is minimal. The economics hinge on long-term contracts and high rack occupancy, not on per-GPU-hour arbitrage. 9.On-premise self-managed GPU infrastructure (customer buys and runs everything) At the bottom of the hierarchy is fully self-managed infrastructure. The customer purchases GPUs and servers, builds or repurposes facilities, manages power and cooling, implements networking and storage, and runs their own orchestration stacks (Kubernetes, Slurm, Ray, proprietary frameworks). This includes both traditional on-premise data centers and private cloud environments. Here, GPUs are pure capital assets on the customer balance sheet. There is no external “GPU service provider” per se; the customer is both owner and operator. The upside is maximal control over performance, security, and long-term cost of compute, particularly for predictable, high-volume workloads. The downside is high upfront capex, operational complexity, and exposure to technology obsolescence (e.g., transitions from A100 to H100 to B200 and beyond). From a market-structure perspective, this tier competes with all higher layers on a lifecycle TCO basis. For large, steady workloads, internal GPU fleets can be cost-advantaged over public cloud or neocloud offerings at current GPU-hour pricing, but they lack elasticity and require strong internal capabilities. For small or spiky workloads, higher-level services are generally more economical. Taken together, this hierarchy illustrates how the same Nvidia GPU products underpin multiple economic layers. At the top of the stack, GPUs are an invisible input into SaaS and model APIs, where value is captured through application outcomes and proprietary models. In the middle, GPUs are mediated by platforms and managed clusters, where provider differentiation relies on orchestration, tooling, and integration. At the bottom, GPUs are exposed directly via VMs, bare metal, or colocation, where competition tends toward price-per-GPU-hour and utilization management. Investment analysis of any player in this ecosystem requires identifying where in this hierarchy it operates, how much of the stack it controls, and how exposed it is to shifts in GPU pricing, utilization, and Nvidia’s product cadence.

5,941

Rent GPUs | Vast.ai

Rent GPUs | Vast.ai

Rent GPUs | Vast.ai

Rent GPUs | Vast.ai

gpt-oss-120b を試す！高火力 DOK で始めるコンテナ型GPUクラウド活用 | さくらのナレッジ