OPEN SIGNAL
Signal Maps ·

Signal Map: The AI Infrastructure Stack — From Silicon to Application

A layer-by-layer map of the full AI infrastructure stack, from chips and systems to clouds, platforms, models, and applications.

The Stack at a Glance

Every AI application — from a chatbot answering customer questions to a model generating protein structures — depends on an infrastructure stack that spans six distinct layers. Each layer has its own competitive dynamics, cost structures, and bottlenecks. Understanding this stack is essential for anyone making infrastructure decisions, evaluating AI companies, or trying to identify where value will accrue as the market matures.

The layers, from bottom to top: silicon (chips), systems (servers and networking), cloud (compute providers), platform (training and serving frameworks), model (the AI models themselves), and application (end-user products). Value creation and capture differ significantly at each level, and the companies that dominate one layer rarely dominate adjacent ones.

Stack Overview

LayerFunctionKey PlayersPrimary BottleneckValue Capture
SiliconRaw compute and memoryNVIDIA, AMD, Intel, Google, AWS, BroadcomManufacturing capacity, HBM supplyVery high margins (NVIDIA ~75% gross)
SystemsServers, networking, coolingDell, HPE, Super Micro, NVIDIA (DGX)Power delivery, thermal managementModerate margins, high volume
CloudOn-demand GPU/TPU accessAWS, Azure, GCP, Oracle, CoreWeave, LambdaGPU availability, pricingHigh revenue, margin pressure
PlatformTraining and serving softwarePyTorch, JAX, vLLM, NVIDIA NIM, AnyscaleFramework fragmentation, optimization complexityMostly open-source; value in managed services
ModelPre-trained and fine-tuned modelsOpenAI, Anthropic, Google, Meta, MistralTraining cost, data qualityHigh for frontier; commoditizing for smaller models
ApplicationEnd-user AI productsThousands of startups, enterprise SaaSDistribution, retention, workflow integrationVaries widely; highest potential at scale

Layer 1: Silicon

The chip layer is where the fundamental compute constraints originate, and it is the most concentrated layer in the stack. NVIDIA’s dominance here cascades through every layer above it.

Key Players

CompanyKey ProductsFocusArchitectureMarket Role
NVIDIAH100, H200, B100/B200 (Blackwell), GB200Training + InferenceGPU (CUDA)Dominant; ~80%+ data center AI GPU share
AMDMI300X, MI300A, MI350 (planned)Training + InferenceGPU (ROCm)Credible challenger; gaining cloud traction
IntelGaudi 3 (Habana Labs)Training + InferencePurpose-built acceleratorStruggling for relevance
GoogleTPU v5p, TPU v5e, Trillium (v6)Training + InferenceCustom ASICInternal use + Google Cloud
AWSTrainium2, Inferentia2Training / InferenceCustom ASICAWS ecosystem cost optimization
BroadcomCustom AI accelerators (for Google, Meta)VariousCustom ASIC designDesign partner for hyperscaler custom silicon
GroqLPUInferenceDeterministic synchronousUltra-low latency inference specialist
CerebrasWafer Scale Engine 3Training + InferenceWafer-scale chipNiche high-performance applications

The silicon layer’s economics are defined by two scarcities: leading-edge fabrication capacity (concentrated at TSMC) and High Bandwidth Memory supply (concentrated at SK Hynix and Samsung). These supply constraints have kept GPU prices elevated and delivery timelines extended, directly shaping the cost structure of every AI workload.

NVIDIA’s competitive moat at this layer is not purely hardware — it is the CUDA software ecosystem. CUDA represents nearly two decades of accumulated libraries, optimizations, tooling, and developer familiarity. Competitors must compete against both the hardware and this software inheritance simultaneously.

Layer 2: Systems

The systems layer transforms individual chips into deployable compute infrastructure. This layer has become significantly more complex as AI workloads scale, driven by power consumption, thermal management, and high-speed interconnect requirements.

Key Players

CompanyKey ProductsSpecialization
NVIDIADGX B200, DGX SuperPODTurnkey AI systems, NVLink interconnect
Dell TechnologiesPowerEdge XE seriesEnterprise AI server integration
Super MicroGPU-optimized serversHigh density, liquid cooling, fast time-to-market
HPECray EX seriesSupercomputer-scale AI systems
CiscoNexus networkingData center fabric for AI clusters
Arista NetworksAI spine networkingHigh-bandwidth, low-latency networking
InfiniBand (NVIDIA)ConnectX-7, Quantum switchesGPU-to-GPU interconnect fabric

The critical engineering challenge at this layer is interconnect bandwidth between GPUs. Training large models requires thousands of GPUs to communicate continuously, and the speed of this communication directly determines training efficiency. NVIDIA’s NVLink and InfiniBand technologies provide the highest-bandwidth interconnects available, creating another layer of competitive advantage that extends beyond the GPU itself.

Power consumption has emerged as a first-order constraint. A single rack of eight NVIDIA B200 GPUs can consume over 10 kilowatts. Clusters of thousands of GPUs require megawatts of power delivery and corresponding cooling capacity. This has made power availability and data center thermal design critical factors in AI infrastructure planning, spawning a secondary market in AI-optimized data centers and liquid cooling solutions.

Layer 3: Cloud

The cloud layer provides on-demand access to AI compute, abstracting away the complexity of hardware procurement, data center operations, and systems management. This layer is bifurcating into hyperscalers (who offer AI compute alongside their full cloud platform) and GPU-specialized clouds (who focus exclusively on AI workloads).

Key Players

ProviderAI Compute OfferingsDifferentiationPrimary Customer
AWSNVIDIA GPUs, Trainium, InferentiaBroadest service portfolio, Bedrock model marketplaceEnterprise, startups
Microsoft AzureNVIDIA GPUs, OpenAI exclusive partnershipPreferred for OpenAI model access, enterprise integrationEnterprise (M365 ecosystem)
Google CloudNVIDIA GPUs, TPU v5, TrilliumTPU price-performance, Vertex AI platformAI-native companies, researchers
Oracle CloudNVIDIA GPUs, bare-metal GPU clustersAggressive pricing, large contiguous clustersAI training workloads
CoreWeaveNVIDIA GPUs (H100, B200)GPU-native cloud, high-performance networkingAI startups, model training
Lambda LabsNVIDIA GPUsDeveloper-friendly GPU cloudML researchers, small teams
Together AINVIDIA GPUs, open-source model servingInference platform + computeDevelopers using open-source models
Crusoe EnergyNVIDIA GPUs, clean energy poweredCarbon-neutral AI computeClimate-conscious enterprises

The hyperscalers’ AI strategy extends well beyond raw compute provision. AWS Bedrock, Azure AI Studio, and Google Cloud Vertex AI are each building managed platforms that handle model hosting, fine-tuning, evaluation, and orchestration. These platforms create switching costs by integrating model serving with the rest of each cloud’s service portfolio — storage, databases, networking, security, and identity.

The GPU-specialized clouds compete primarily on price, availability, and simplicity. CoreWeave, in particular, has built a multi-billion-dollar business by offering large, contiguous GPU clusters optimized for AI training — a workload pattern that hyperscalers have sometimes struggled to serve at competitive prices due to the overhead of their general-purpose infrastructure.

Layer 4: Platform

The platform layer provides the software frameworks and tools that AI engineers use to train, fine-tune, optimize, and serve models. This layer is predominantly open-source, with value captured through managed services and commercial wrappers.

Training Frameworks

FrameworkDeveloperPrimary UseEcosystem Role
PyTorchMeta (now Linux Foundation)Model training and researchDe facto standard for AI research and production
JAXGoogle DeepMindModel training (esp. TPU workloads)Preferred for Google ecosystem and research
DeepSpeedMicrosoftDistributed training optimizationEnables training of very large models on GPU clusters
Megatron-LMNVIDIALarge-scale model trainingReference implementation for distributed LLM training
Ray / AnyscaleAnyscaleDistributed compute orchestrationHorizontal scaling for training and serving

Serving and Inference

FrameworkDeveloperPrimary UseKey Feature
vLLMUC Berkeley (open-source)LLM inference servingPagedAttention, high-throughput serving
TensorRT-LLMNVIDIAOptimized inference on NVIDIA GPUsMaximum GPU utilization for inference
NVIDIA NIMNVIDIAContainerized model deploymentPre-optimized inference microservices
Triton Inference ServerNVIDIAMulti-framework model servingFramework-agnostic inference serving
OllamaOpen-sourceLocal model servingDeveloper-friendly local LLM deployment
llama.cppOpen-source (Georgi Gerganov)CPU/GPU inferenceEfficient quantized inference, broad hardware support

Orchestration and Tooling

ToolDeveloperPrimary Use
LangChain / LangGraphLangChain Inc.LLM application orchestration, agent frameworks
Weights & BiasesW&BExperiment tracking, model monitoring
MLflowDatabricksML lifecycle management
Hugging FaceHugging FaceModel distribution, datasets, collaboration
ModalModal LabsServerless GPU compute for ML workloads

PyTorch’s dominance at the training layer is the software equivalent of NVIDIA’s hardware dominance — pervasive and self-reinforcing. Almost every major model released in the past three years was trained using PyTorch or a PyTorch derivative. This creates a gravitational pull for tooling, optimization work, and developer education that is extremely difficult for alternatives to overcome. JAX maintains a significant niche, particularly within Google’s ecosystem and for research requiring advanced automatic differentiation, but PyTorch’s community mass is its defining advantage.

On the inference side, the landscape is more fragmented and evolving rapidly. vLLM has emerged as the leading open-source inference engine, with its PagedAttention algorithm enabling substantially higher throughput for LLM serving. NVIDIA’s TensorRT-LLM and NIM provide maximum performance on NVIDIA hardware but sacrifice portability. This tension between performance optimization and hardware portability is a recurring theme at the platform layer.

Layer 5: Model

The model layer sits at the intersection of research and infrastructure. Pre-trained foundation models represent enormous fixed costs (tens to hundreds of millions of dollars in compute for frontier training runs) that are amortized over inference volume. This cost structure favors scale and creates natural oligopoly dynamics at the frontier.

TierExamplesCharacteristicsTraining Cost (Estimated)
Frontier ClosedGPT-4o, Claude 3.5, Gemini Ultra, o1/o3Highest capability, proprietary, API access$100M - $500M+
Frontier OpenLlama 3.1 405B, DeepSeek-V3Near-frontier capability, downloadable weights$50M - $200M+
Mid-tier OpenLlama 3.1 70B, Qwen2.5 72B, Mixtral 8x22BStrong capability, practical to self-host$10M - $50M
Efficient OpenLlama 3.1 8B, Mistral 7B, Phi-3, Gemma 2Good capability at small scale, edge-deployable$1M - $10M
Specialized / Fine-tunedCodeLlama, Med-PaLM, BloombergGPTDomain-optimized performanceVaries (fine-tuning: $10K - $1M)

The model layer is undergoing rapid commoditization at every tier below the absolute frontier. The performance gap between the best open model and the best closed model at any given parameter count has compressed to the point where, for many production applications, the choice between open and closed is driven by deployment preferences and cost rather than capability differences.

This commoditization is shifting value capture away from the model itself and toward the layers above (applications, workflows) and below (efficient serving infrastructure). Model providers that do not control either the application layer or the infrastructure layer risk becoming interchangeable commodity suppliers.

Layer 6: Application

The application layer is where AI capabilities become end-user products. This layer is the most fragmented, the most dynamic, and — for many investors — the most uncertain in terms of where durable value will accrue.

Application Categories

CategoryExamplesBusiness ModelAI Integration Pattern
AI AssistantsChatGPT, Claude.ai, Gemini, PerplexitySubscription + APIModel as the product
Coding ToolsGitHub Copilot, Cursor, Replit, CodeiumSubscription (seat-based)Model embedded in IDE
Enterprise SearchGlean, Coveo AI, Elastic AIEnterprise SaaSRAG over enterprise data
Content GenerationJasper, Copy.ai, WriterSubscriptionModel as content engine
Vertical AIHarvey (legal), Abridge (healthcare), Ramp (finance)Vertical SaaSDomain-specific fine-tuning + workflow
AI AgentsAdept, Cognition (Devin), MultiOnUsage-based (emerging)Autonomous task execution
Image/Video GenerationMidjourney, Runway, Pika, Stability AISubscription + creditsGenerative media pipeline
Voice/Audio AIElevenLabs, Descript, AssemblyAIUsage-basedSpecialized audio models

The application layer’s fundamental challenge is defensibility. When the underlying models are improving rapidly and available from multiple providers, application-layer companies must build moats through distribution, workflow integration, proprietary data, and user experience rather than model capability alone. The applications that have gained the most traction — GitHub Copilot (distribution through Microsoft), Perplexity (novel UX for search), Harvey (deep legal domain expertise) — each combine model capabilities with at least one additional source of competitive advantage.

Cross-Layer Dependencies

The stack is not a set of independent layers — critical dependencies run vertically across it.

NVIDIA’s vertical reach extends from silicon (GPU design) through systems (DGX, NVLink, InfiniBand) to platform (CUDA, TensorRT, NIM, Triton) and increasingly into cloud (DGX Cloud partnerships). This vertical integration is NVIDIA’s deepest competitive advantage: each layer reinforces the others, making it progressively more difficult for customers to substitute at any single point.

Hyperscaler vertical integration follows a similar logic. Google controls TPU silicon, GCP cloud infrastructure, JAX/TensorFlow frameworks, Gemini models, and application integration across Search, Workspace, and Android. AWS spans Trainium/Inferentia chips, EC2/ECS infrastructure, SageMaker/Bedrock platforms, and application services. This vertical integration enables optimization across layer boundaries that third-party stacks cannot match.

Open-source horizontal layers (PyTorch, vLLM, Llama, Hugging Face) provide a counterweight to vertical integration by creating shared infrastructure that works across multiple hardware and cloud providers. These horizontal layers reduce switching costs and prevent any single vertically integrated player from capturing the entire stack.

What to Watch

The inference cost curve. The cost per token of LLM inference is falling rapidly — driven by hardware improvements, software optimization (quantization, speculative decoding, batching), and competition. This cost curve is the single most important variable for the application layer: when inference is cheap enough, entire categories of applications become economically viable that are currently marginal.

Custom silicon proliferation. Every major cloud provider is investing in custom AI chips. If these efforts succeed in offering price-performance competitive with NVIDIA at scale, the silicon layer’s concentration — and NVIDIA’s pricing power — will erode. Watch Trainium2 adoption at AWS and TPU v6 (Trillium) performance benchmarks at Google Cloud as leading indicators.

Framework consolidation or fragmentation. The platform layer could consolidate around PyTorch and a small number of inference engines, or it could fragment further as different hardware targets demand different software stacks. Hardware-agnostic compilation layers (OpenAI Triton, Apache TVM, MLIR) could reduce fragmentation by providing portable performance across chip architectures.

Application layer shakeout. The AI application ecosystem has been fueled by venture funding and low barriers to building model-wrapper applications. As funding conditions tighten and incumbents integrate AI into existing products, many application-layer startups will face existential pressure. The survivors will be those with genuine workflow integration, proprietary data advantages, or distribution moats that make them difficult to replicate.

Energy constraints. AI infrastructure’s power consumption is growing faster than data center capacity. Microsoft, Google, Amazon, and Meta have collectively committed to hundreds of megawatts of new power capacity, including investments in nuclear and renewable energy. If power becomes the binding constraint on AI scaling, it will reshape every layer of the stack — favoring energy-efficient architectures, edge inference, and model compression techniques.

The Bigger Picture

The AI infrastructure stack in 2026 exhibits a pattern common to maturing technology ecosystems: concentrated value at the bottom (NVIDIA’s silicon dominance), open and commoditized middle layers (open-source frameworks and models), and fragmented competition at the top (applications). This structure rewards vertical integration at scale (NVIDIA, Google) and horizontal platform plays (Hugging Face, PyTorch) while squeezing companies that operate at a single layer without differentiation.

The most consequential shifts over the next two to three years will likely occur at the boundaries between layers: custom silicon eroding GPU dominance, managed platforms absorbing framework complexity, model commoditization shifting value to applications, and application-layer competition determining which AI capabilities become infrastructure versus features. Understanding these layer dynamics — not just individual company strategies — is essential for navigating the AI market’s next phase.

Get the signal in your inbox

Free. Sourced. AI-written. The AI buildout, daily.

No spam. Unsubscribe anytime.