Exploring heterogeneous compute in the data center—how custom ASIC designs help create XPUs to accelerate modern AI workloads. Data Center Insights by Jerry Chang, Sr. Director, Head of Marketing, Americas and Europe.
The modern data center is awash in acronyms. Terms such as CPU, DSP, GPU, NPU, TPU, FPGA, XPU, and custom ASIC are discussed as if their roles were self-evident. In practice, however, the terminology is evolving as quickly as data center workloads themselves. What once described clear architectural boundaries now reflects a far more heterogeneous and system-driven reality.
To make sense of custom silicon in the data center, we need to rethink the vocabulary itself. This means understanding how traditional processor categories are converging at the system, workload, and integration levels, why the industry is increasingly speaking in terms of XPUs and heterogeneous integration, and when workload-optimized custom ASIC platforms deliver meaningful advantages in performance, efficiency, and total cost of ownership.
Understanding the Landscape of Modern Compute Architectures
Early Central Processing Units (CPUs) were fundamentally scalar machines designed for general-purpose computing. Over time, to accelerate media, signal processing, and graphics workloads, CPUs added vector instructions that allow a single instruction to operate on multiple data elements in parallel.
For many people, especially those familiar with PC architectures, the next major category of processor is the Graphics Processing Unit (GPU). Originally developed to accelerate 3D graphics, GPUs are highly parallel processors composed of many lightweight compute cores, each paired with relatively small local memory resources. Early GPUs evolved from fixed-function graphics pipelines into programmable vector architectures optimized for rendering and data-parallel computation. Today, modern GPUs extend this design with dedicated matrix and tensor engines that accelerate AI workloads.
While GPUs evolved from graphics pipelines, another important class of processors emerged to address real-time signal-processing workloads. Digital Signal Processors (DSPs) were designed to handle computationally intensive tasks that early CPUs could not perform fast enough for real-time applications.
DSP architectures emphasize single-cycle multiply-accumulate (MAC) operations, specialized memory addressing modes, and efficient vector-style processing. These capabilities make them well-suited for workloads such as filtering, convolution, and transforms that manipulate audio, video, and other streaming data.
Many of the mathematical patterns used in modern AI workloads—especially large numbers of MAC operations—closely resemble the signal-processing algorithms DSPs were designed to accelerate.
As machine learning workloads grew in scale and complexity, new processor architectures emerged that were specifically optimized for the matrix and tensor operations common in neural networks.
Neural Processing Units (NPUs) are designed to accelerate these workloads using highly parallel compute arrays, dataflow execution models, and energy-efficient MAC engines optimized for tensor operations.
Tensor Processing Units (TPUs), developed by Google for its data centers, are specialized AI accelerators designed to efficiently execute large-matrix operations used in both AI training and inference.
Although these architectures are often described as entirely new processor classes, they build upon ideas long present in both GPU and DSP designs—large numbers of MAC units, parallel dataflows, and specialized memory hierarchies optimized for mathematical throughput.
The term XPU is commonly used to mean “some kind of processing unit,” where the “X” acts as a placeholder for the specific type. In practice, it often serves as an umbrella term for processors such as GPUs, NPUs, TPUs, and other specialized accelerators used in heterogeneous compute platforms.
Chips, Chiplets, Custom ASICs, and Multi-Die Systems
Modern data center accelerators have evolved from single, monolithic chips into highly integrated, heterogeneous systems.
Early accelerators for AI were typically implemented as individually packaged, single‑die chips—often GPUs, FPGAs, or fixed‑function ASICs—each targeting specific classes of workloads. Over time, many of these capabilities have been integrated as IP blocks inside larger custom ASICs, which combine general‑purpose CPU cores with domain‑specific accelerators for AI, networking, and security.
Increasingly, these custom ASICs are no longer realized as a single piece of silicon. Instead, they are partitioned into smaller chiplets—individual silicon dies, each implementing a subset of the overall functionality (for example, compute tiles, I/O tiles, cache/memory tiles). Multiple chiplets are then co‑packaged into a single multi‑die system, improving yield, scalability, and mix‑and‑match flexibility across process nodes and vendors.
In this context, the term “XPU” is used less to describe a specific chip and more as an architectural umbrella for heterogeneous compute platforms that combine CPUs, GPUs, NPUs, DSPs, FPGAs, and other accelerators—often implemented as IP blocks on custom ASICs and/or as chiplets within a shared package. These XPU‑style systems are designed holistically at the platform level to deliver the right mix of performance, efficiency, memory bandwidth, and interconnect for modern AI and HPC data center workloads.
At MediaTek, we design custom ASICs for our customers to create best-in-class XPUs. These custom ASICs include the compute die (AI accelerator(s)), memory interfaces, I/O chiplets, and more. The resulting XPU then connects to CPUs, DPUs, and other components that create the compute rack.
Conclusion
The diversity of AI workloads continues to drive heterogeneous combinations of different processing units. In this environment, custom ASICs are best understood not as competitors to any single processor type—or as “custom CPUs”—but as purpose-built, system-aware silicon platforms. By integrating the right mix of compute cores with optimized memory hierarchies, interconnect fabrics, power delivery, and packaging technologies, these platforms enable designers to balance workload density, efficiency, and scale across the entire data-center system.
MediaTek’s role is to serve our customers in this process, helping them navigate architectural choices and realize solutions that are holistically optimized—from the silicon dies through interconnect and advanced packaging—to meet the needs of the next generation of data center performance.