Concept

AI COMPUTE

The hardware layer that powers training and inference — GPUs, TPUs, and the supply chains that constrain them.

AI compute refers to the specialized hardware — primarily GPUs and custom accelerators — required to train and run large AI models. Unlike general-purpose computing, AI compute is highly concentrated in a small number of chip designs, fabs, and supply chains, making access as much a geopolitical question as a commercial one.

Why AI needs specialized hardware

Training a large language model involves performing billions of matrix multiplications across trillions of parameters. Standard CPUs execute these operations sequentially and are poorly suited to the task. GPUs were originally designed for parallel graphics rendering — and it turns out the same massively parallel architecture that renders video game frames is highly efficient for matrix math. NVIDIA recognized this early and built CUDA, a software layer that made GPUs programmable for general scientific computing, long before AI demand existed at scale.

The result is that NVIDIA captured an estimated 70–80% of the AI training chip market. AMD is the credible alternative on the hardware side; Intel has struggled to compete. Google builds its own TPUs (Tensor Processing Units) for internal use. A handful of AI chip startups — Cerebras, Groq, Graphcore — have built specialized architectures with theoretical advantages in specific workloads but limited market penetration.

Training vs. inference

AI compute divides into two distinct workloads with different requirements. Training is the process of adjusting model weights across a massive dataset — it is computationally intensive, runs for days or weeks, and benefits from as many high-memory GPUs as can be clustered together. Inference is the process of running a trained model to generate outputs — it is lower in raw compute intensity but much higher in throughput requirements (millions of requests per second) and extremely sensitive to latency.

The economics are different. Training clusters are optimized for throughput, tolerate high latency, and can be run at scheduled times (not real-time). Inference infrastructure is optimized for low latency, high availability, and cost per output token. The shift in the AI industry from primarily a training problem to primarily an inference problem — as models mature and usage scales — has significant implications for what hardware gets purchased and where it gets deployed.

The supply chain constraint

NVIDIA GPUs are manufactured by TSMC in Taiwan on advanced process nodes (currently 4nm and 3nm). The supply chain from chip design to finished GPU involves TSMC's foundry capacity, NVIDIA's proprietary high-bandwidth memory (from SK Hynix and Micron), advanced packaging technology, and global logistics. Any disruption at any point — a Taiwan Strait conflict, TSMC fab capacity limits, memory supply constraints — propagates through to AI infrastructure buildout timelines worldwide.

US export controls on advanced AI chips to China have added a geopolitical dimension. Chips at or above a certain compute density threshold are restricted from export. This has accelerated Chinese domestic chip development (Huawei's Ascend series being the primary example) and has bifurcated the global AI infrastructure market into two supply chains with limited interoperability.

What comes next

The GPU architecture that dominates today — the NVIDIA H100/H200/B200 lineage — will not be the final form of AI compute. Several architectural shifts are underway: more on-chip memory (HBM4 and beyond), more integration of memory and compute, optical interconnects for inter-chip communication, and specialized silicon for inference workloads. The companies that get the hardware-software co-design right for the next generation of models will have a significant cost and performance advantage.

For anyone building on AI infrastructure today, the key question is how dependent their strategy is on current hardware economics — and how it changes if those economics shift significantly, as they almost certainly will.

Related Analysis

economics The Economics of AI Infrastructure

Jun 6, 2026 · 14 min read