Signal Map: The AI Infrastructure Stack — From Silicon to Application

The Stack at a Glance

Every AI application — from a chatbot answering customer questions to a model generating protein structures — depends on an infrastructure stack that spans six distinct layers. Each layer has its own competitive dynamics, cost structures, and bottlenecks. Understanding this stack is essential for anyone making infrastructure decisions, evaluating AI companies, or trying to identify where value will accrue as the market matures.

The layers, from bottom to top: silicon (chips), systems (servers and networking), cloud (compute providers), platform (training and serving frameworks), model (the AI models themselves), and application (end-user products). Value creation and capture differ significantly at each level, and the companies that dominate one layer rarely dominate adjacent ones.

Stack Overview

Layer	Function	Key Players	Primary Bottleneck	Value Capture
Silicon	Raw compute and memory	NVIDIA, AMD, Intel, Google, AWS, Broadcom	Manufacturing capacity, HBM supply	Very high margins (NVIDIA ~75% gross)
Systems	Servers, networking, cooling	Dell, HPE, Super Micro, NVIDIA (DGX)	Power delivery, thermal management	Moderate margins, high volume
Cloud	On-demand GPU/TPU access	AWS, Azure, GCP, Oracle, CoreWeave, Lambda	GPU availability, pricing	High revenue, margin pressure
Platform	Training and serving software	PyTorch, JAX, vLLM, NVIDIA NIM, Anyscale	Framework fragmentation, optimization complexity	Mostly open-source; value in managed services
Model	Pre-trained and fine-tuned models	OpenAI, Anthropic, Google, Meta, Mistral	Training cost, data quality	High for frontier; commoditizing for smaller models
Application	End-user AI products	Thousands of startups, enterprise SaaS	Distribution, retention, workflow integration	Varies widely; highest potential at scale

Layer 1: Silicon

The chip layer is where the fundamental compute constraints originate, and it is the most concentrated layer in the stack. NVIDIA’s dominance here cascades through every layer above it.

Key Players

Company	Key Products	Focus	Architecture	Market Role
NVIDIA	H100, H200, B100/B200 (Blackwell), GB200	Training + Inference	GPU (CUDA)	Dominant; ~80%+ data center AI GPU share
AMD	MI300X, MI300A, MI350 (planned)	Training + Inference	GPU (ROCm)	Credible challenger; gaining cloud traction
Intel	Gaudi 3 (Habana Labs)	Training + Inference	Purpose-built accelerator	Struggling for relevance
Google	TPU v5p, TPU v5e, Trillium (v6)	Training + Inference	Custom ASIC	Internal use + Google Cloud
AWS	Trainium2, Inferentia2	Training / Inference	Custom ASIC	AWS ecosystem cost optimization
Broadcom	Custom AI accelerators (for Google, Meta)	Various	Custom ASIC design	Design partner for hyperscaler custom silicon
Groq	LPU	Inference	Deterministic synchronous	Ultra-low latency inference specialist
Cerebras	Wafer Scale Engine 3	Training + Inference	Wafer-scale chip	Niche high-performance applications

The silicon layer’s economics are defined by two scarcities: leading-edge fabrication capacity (concentrated at TSMC) and High Bandwidth Memory supply (concentrated at SK Hynix and Samsung). These supply constraints have kept GPU prices elevated and delivery timelines extended, directly shaping the cost structure of every AI workload.

NVIDIA’s competitive moat at this layer is not purely hardware — it is the CUDA software ecosystem. CUDA represents nearly two decades of accumulated libraries, optimizations, tooling, and developer familiarity. Competitors must compete against both the hardware and this software inheritance simultaneously.

Layer 2: Systems

The systems layer transforms individual chips into deployable compute infrastructure. This layer has become significantly more complex as AI workloads scale, driven by power consumption, thermal management, and high-speed interconnect requirements.

Key Players

Company	Key Products	Specialization
NVIDIA	DGX B200, DGX SuperPOD	Turnkey AI systems, NVLink interconnect
Dell Technologies	PowerEdge XE series	Enterprise AI server integration
Super Micro	GPU-optimized servers	High density, liquid cooling, fast time-to-market
HPE	Cray EX series	Supercomputer-scale AI systems
Cisco	Nexus networking	Data center fabric for AI clusters
Arista Networks	AI spine networking	High-bandwidth, low-latency networking
InfiniBand (NVIDIA)	ConnectX-7, Quantum switches	GPU-to-GPU interconnect fabric

The critical engineering challenge at this layer is interconnect bandwidth between GPUs. Training large models requires thousands of GPUs to communicate continuously, and the speed of this communication directly determines training efficiency. NVIDIA’s NVLink and InfiniBand technologies provide the highest-bandwidth interconnects available, creating another layer of competitive advantage that extends beyond the GPU itself.

Power consumption has emerged as a first-order constraint. A single rack of eight NVIDIA B200 GPUs can consume over 10 kilowatts. Clusters of thousands of GPUs require megawatts of power delivery and corresponding cooling capacity. This has made power availability and data center thermal design critical factors in AI infrastructure planning, spawning a secondary market in AI-optimized data centers and liquid cooling solutions.

Layer 3: Cloud

The cloud layer provides on-demand access to AI compute, abstracting away the complexity of hardware procurement, data center operations, and systems management. This layer is bifurcating into hyperscalers (who offer AI compute alongside their full cloud platform) and GPU-specialized clouds (who focus exclusively on AI workloads).

Key Players

Provider	AI Compute Offerings	Differentiation	Primary Customer
AWS	NVIDIA GPUs, Trainium, Inferentia	Broadest service portfolio, Bedrock model marketplace	Enterprise, startups
Microsoft Azure	NVIDIA GPUs, OpenAI exclusive partnership	Preferred for OpenAI model access, enterprise integration	Enterprise (M365 ecosystem)
Google Cloud	NVIDIA GPUs, TPU v5, Trillium	TPU price-performance, Vertex AI platform	AI-native companies, researchers
Oracle Cloud	NVIDIA GPUs, bare-metal GPU clusters	Aggressive pricing, large contiguous clusters	AI training workloads
CoreWeave	NVIDIA GPUs (H100, B200)	GPU-native cloud, high-performance networking	AI startups, model training
Lambda Labs	NVIDIA GPUs	Developer-friendly GPU cloud	ML researchers, small teams
Together AI	NVIDIA GPUs, open-source model serving	Inference platform + compute	Developers using open-source models
Crusoe Energy	NVIDIA GPUs, clean energy powered	Carbon-neutral AI compute	Climate-conscious enterprises

The hyperscalers’ AI strategy extends well beyond raw compute provision. AWS Bedrock, Azure AI Studio, and Google Cloud Vertex AI are each building managed platforms that handle model hosting, fine-tuning, evaluation, and orchestration. These platforms create switching costs by integrating model serving with the rest of each cloud’s service portfolio — storage, databases, networking, security, and identity.

The GPU-specialized clouds compete primarily on price, availability, and simplicity. CoreWeave, in particular, has built a multi-billion-dollar business by offering large, contiguous GPU clusters optimized for AI training — a workload pattern that hyperscalers have sometimes struggled to serve at competitive prices due to the overhead of their general-purpose infrastructure.

Layer 4: Platform

The platform layer provides the software frameworks and tools that AI engineers use to train, fine-tune, optimize, and serve models. This layer is predominantly open-source, with value captured through managed services and commercial wrappers.

Training Frameworks

Framework	Developer	Primary Use	Ecosystem Role
PyTorch	Meta (now Linux Foundation)	Model training and research	De facto standard for AI research and production
JAX	Google DeepMind	Model training (esp. TPU workloads)	Preferred for Google ecosystem and research
DeepSpeed	Microsoft	Distributed training optimization	Enables training of very large models on GPU clusters
Megatron-LM	NVIDIA	Large-scale model training	Reference implementation for distributed LLM training
Ray / Anyscale	Anyscale	Distributed compute orchestration	Horizontal scaling for training and serving

Serving and Inference

Framework	Developer	Primary Use	Key Feature
vLLM	UC Berkeley (open-source)	LLM inference serving	PagedAttention, high-throughput serving
TensorRT-LLM	NVIDIA	Optimized inference on NVIDIA GPUs	Maximum GPU utilization for inference
NVIDIA NIM	NVIDIA	Containerized model deployment	Pre-optimized inference microservices
Triton Inference Server	NVIDIA	Multi-framework model serving	Framework-agnostic inference serving
Ollama	Open-source	Local model serving	Developer-friendly local LLM deployment
llama.cpp	Open-source (Georgi Gerganov)	CPU/GPU inference	Efficient quantized inference, broad hardware support

Orchestration and Tooling

Tool	Developer	Primary Use
LangChain / LangGraph	LangChain Inc.	LLM application orchestration, agent frameworks
Weights & Biases	W&B	Experiment tracking, model monitoring
MLflow	Databricks	ML lifecycle management
Hugging Face	Hugging Face	Model distribution, datasets, collaboration
Modal	Modal Labs	Serverless GPU compute for ML workloads

PyTorch’s dominance at the training layer is the software equivalent of NVIDIA’s hardware dominance — pervasive and self-reinforcing. Almost every major model released in the past three years was trained using PyTorch or a PyTorch derivative. This creates a gravitational pull for tooling, optimization work, and developer education that is extremely difficult for alternatives to overcome. JAX maintains a significant niche, particularly within Google’s ecosystem and for research requiring advanced automatic differentiation, but PyTorch’s community mass is its defining advantage.

On the inference side, the landscape is more fragmented and evolving rapidly. vLLM has emerged as the leading open-source inference engine, with its PagedAttention algorithm enabling substantially higher throughput for LLM serving. NVIDIA’s TensorRT-LLM and NIM provide maximum performance on NVIDIA hardware but sacrifice portability. This tension between performance optimization and hardware portability is a recurring theme at the platform layer.

Layer 5: Model

The model layer sits at the intersection of research and infrastructure. Pre-trained foundation models represent enormous fixed costs (tens to hundreds of millions of dollars in compute for frontier training runs) that are amortized over inference volume. This cost structure favors scale and creates natural oligopoly dynamics at the frontier.

Tier	Examples	Characteristics	Training Cost (Estimated)
Frontier Closed	GPT-4o, Claude 3.5, Gemini Ultra, o1/o3	Highest capability, proprietary, API access	$100M - $500M+
Frontier Open	Llama 3.1 405B, DeepSeek-V3	Near-frontier capability, downloadable weights	$50M - $200M+
Mid-tier Open	Llama 3.1 70B, Qwen2.5 72B, Mixtral 8x22B	Strong capability, practical to self-host	$10M - $50M
Efficient Open	Llama 3.1 8B, Mistral 7B, Phi-3, Gemma 2	Good capability at small scale, edge-deployable	$1M - $10M
Specialized / Fine-tuned	CodeLlama, Med-PaLM, BloombergGPT	Domain-optimized performance	Varies (fine-tuning: $10K - $1M)

The model layer is undergoing rapid commoditization at every tier below the absolute frontier. The performance gap between the best open model and the best closed model at any given parameter count has compressed to the point where, for many production applications, the choice between open and closed is driven by deployment preferences and cost rather than capability differences.

This commoditization is shifting value capture away from the model itself and toward the layers above (applications, workflows) and below (efficient serving infrastructure). Model providers that do not control either the application layer or the infrastructure layer risk becoming interchangeable commodity suppliers.

Layer 6: Application

The application layer is where AI capabilities become end-user products. This layer is the most fragmented, the most dynamic, and — for many investors — the most uncertain in terms of where durable value will accrue.

Application Categories

Category	Examples	Business Model	AI Integration Pattern
AI Assistants	ChatGPT, Claude.ai, Gemini, Perplexity	Subscription + API	Model as the product
Coding Tools	GitHub Copilot, Cursor, Replit, Codeium	Subscription (seat-based)	Model embedded in IDE
Enterprise Search	Glean, Coveo AI, Elastic AI	Enterprise SaaS	RAG over enterprise data
Content Generation	Jasper, Copy.ai, Writer	Subscription	Model as content engine
Vertical AI	Harvey (legal), Abridge (healthcare), Ramp (finance)	Vertical SaaS	Domain-specific fine-tuning + workflow
AI Agents	Adept, Cognition (Devin), MultiOn	Usage-based (emerging)	Autonomous task execution
Image/Video Generation	Midjourney, Runway, Pika, Stability AI	Subscription + credits	Generative media pipeline
Voice/Audio AI	ElevenLabs, Descript, AssemblyAI	Usage-based	Specialized audio models

The application layer’s fundamental challenge is defensibility. When the underlying models are improving rapidly and available from multiple providers, application-layer companies must build moats through distribution, workflow integration, proprietary data, and user experience rather than model capability alone. The applications that have gained the most traction — GitHub Copilot (distribution through Microsoft), Perplexity (novel UX for search), Harvey (deep legal domain expertise) — each combine model capabilities with at least one additional source of competitive advantage.

Cross-Layer Dependencies

The stack is not a set of independent layers — critical dependencies run vertically across it.

NVIDIA’s vertical reach extends from silicon (GPU design) through systems (DGX, NVLink, InfiniBand) to platform (CUDA, TensorRT, NIM, Triton) and increasingly into cloud (DGX Cloud partnerships). This vertical integration is NVIDIA’s deepest competitive advantage: each layer reinforces the others, making it progressively more difficult for customers to substitute at any single point.

Hyperscaler vertical integration follows a similar logic. Google controls TPU silicon, GCP cloud infrastructure, JAX/TensorFlow frameworks, Gemini models, and application integration across Search, Workspace, and Android. AWS spans Trainium/Inferentia chips, EC2/ECS infrastructure, SageMaker/Bedrock platforms, and application services. This vertical integration enables optimization across layer boundaries that third-party stacks cannot match.

Open-source horizontal layers (PyTorch, vLLM, Llama, Hugging Face) provide a counterweight to vertical integration by creating shared infrastructure that works across multiple hardware and cloud providers. These horizontal layers reduce switching costs and prevent any single vertically integrated player from capturing the entire stack.

What to Watch

The inference cost curve. The cost per token of LLM inference is falling rapidly — driven by hardware improvements, software optimization (quantization, speculative decoding, batching), and competition. This cost curve is the single most important variable for the application layer: when inference is cheap enough, entire categories of applications become economically viable that are currently marginal.

Custom silicon proliferation. Every major cloud provider is investing in custom AI chips. If these efforts succeed in offering price-performance competitive with NVIDIA at scale, the silicon layer’s concentration — and NVIDIA’s pricing power — will erode. Watch Trainium2 adoption at AWS and TPU v6 (Trillium) performance benchmarks at Google Cloud as leading indicators.

Framework consolidation or fragmentation. The platform layer could consolidate around PyTorch and a small number of inference engines, or it could fragment further as different hardware targets demand different software stacks. Hardware-agnostic compilation layers (OpenAI Triton, Apache TVM, MLIR) could reduce fragmentation by providing portable performance across chip architectures.

Application layer shakeout. The AI application ecosystem has been fueled by venture funding and low barriers to building model-wrapper applications. As funding conditions tighten and incumbents integrate AI into existing products, many application-layer startups will face existential pressure. The survivors will be those with genuine workflow integration, proprietary data advantages, or distribution moats that make them difficult to replicate.

Energy constraints. AI infrastructure’s power consumption is growing faster than data center capacity. Microsoft, Google, Amazon, and Meta have collectively committed to hundreds of megawatts of new power capacity, including investments in nuclear and renewable energy. If power becomes the binding constraint on AI scaling, it will reshape every layer of the stack — favoring energy-efficient architectures, edge inference, and model compression techniques.

The Bigger Picture

The AI infrastructure stack in 2026 exhibits a pattern common to maturing technology ecosystems: concentrated value at the bottom (NVIDIA’s silicon dominance), open and commoditized middle layers (open-source frameworks and models), and fragmented competition at the top (applications). This structure rewards vertical integration at scale (NVIDIA, Google) and horizontal platform plays (Hugging Face, PyTorch) while squeezing companies that operate at a single layer without differentiation.

The most consequential shifts over the next two to three years will likely occur at the boundaries between layers: custom silicon eroding GPU dominance, managed platforms absorbing framework complexity, model commoditization shifting value to applications, and application-layer competition determining which AI capabilities become infrastructure versus features. Understanding these layer dynamics — not just individual company strategies — is essential for navigating the AI market’s next phase.

The Stack at a Glance

Stack Overview

Layer 1: Silicon

Key Players

Layer 2: Systems

Key Players

Layer 3: Cloud

Key Players

Layer 4: Platform

Training Frameworks

Serving and Inference

Orchestration and Tooling

Layer 5: Model

Layer 6: Application

Application Categories

Cross-Layer Dependencies

What to Watch

The Bigger Picture

Get the signal in your inbox