The Great Divergence: When AI Outpaced Moore’s Law

      The Great Divergence: When AI Outpaced Moore’s Law

      Jerry C Chang_Headshot_300x300AI compute demand is now doubling every few months, far faster than traditional scaling strategies can support.

      By Jerry Chang, Sr. Director, Head of Marketing, Americas and Europe

       

      In his 1965 paper, Cramming More Components onto Integrated Circuits, Gordon Moore observed that the number of transistors on an integrated circuit was doubling roughly every year. Around 1975, Moore revised it to “about every two years.”

      There’s a related, but different, idea that is often (mis)attributed to Moore. In 1975, to clarify the implications of Moore’s law, Intel executive David House noted that computational performance seemed to double about every 18 months. This came from combining transistor doubling (~2 years) with improvements in processor architectures, clock speeds, etc.

      Over time, people blurred the two, turning a focused observation about transistor density into a much broader, widely used, and not entirely accurate claim about computing performance.

       

      And Then Came AI

      With the benefit of hindsight (the one exact science), we know that conventional computational capability—meaning the general-purpose, von Neumann-style computing used in everything from PCs and servers to embedded systems—has been doubling every 18 to 24 months since the 1960s. Furthermore, real-world demand for this form of computation has largely tracked the available supply.

      Though its intellectual roots stretch back further, the field of artificial intelligence is generally traced to the 1956 Dartmouth Workshop, where the term was coined, and the discipline first took shape.

      Introduced in 1958-59, the Perceptron was one of the earliest artificial neural network models. From that time until around 2012, AI compute demand largely tracked conventional compute demand. That longstanding equilibrium was shattered in 2012 with the rise of “perceptive AI,” driven by AlexNet, which showed that deep neural networks, given sufficient data and compute, could deliver dramatic leaps in capability.

      With about 60 million parameters and requiring “only” on the order of a billion operations per inference, AlexNet was computationally modest by modern standards. By comparison, models like BERT (2018), with hundreds of millions of parameters, require tens of billions of operations per query, while today’s language models (LLMs), such as GPT-4 (2023), boasting hundreds of billions of parameters (Figure 1), can demand orders of magnitude more compute during inference (Figure 2). Although exact figures for the latest frontier models such as GPT-5 are rarely disclosed, it is widely assumed that some are approaching—or may already have exceeded—the trillion-parameter scale. Adding to the challenge, many of these newer systems also employ Mixture-of-Experts (MoE) architectures, which add complexity in scheduling, memory movement, and system design.

      2026 AI blog no2 fig1 AI model complexity is growing exponentially

      Figure 1: AI model complexity is growing exponentially.

      In short, AI compute demand has exploded by many orders of magnitude, shattering the steady cadence that characterized conventional computing for decades and now doubling every four to five months.

      2026 AI blog no2 fig2 The classical vs. modern AI compute eras

      Figure 2: The classical vs. modern AI compute eras.

      It’s now common to divide AI compute demand into two phases: the Classical (pre-deep learning) era, which commenced around 1960 and ended around 2012, and the Modern (deep learning) era, which began in 2012 and continues to this day, with every indication that it will persist for the foreseeable future.

       

      Closing the Gap from Chips to Systems

      For decades, improvements in compute capability have been delivered largely at the silicon level. Smaller transistors, higher densities, and incremental architectural advances were enough to keep pace with demand. No longer.

      The divergence between AI compute demand and conventional scaling has exposed a fundamental limitation: Process technology alone will not close the gap. The challenge centers on building more efficient systems.

      As AI workloads scale, the constraints shift, and power delivery, thermal management, memory bandwidth, and interconnect efficiency all become first-order concerns. Often, these factors determine how much usable performance can be extracted from a given system.

      An important shift in the AI era is that the rack, not the chip, has become the true unit of compute consumption.

      AI workloads are inherently heterogeneous, spanning training, fine-tuning, and high-volume inference. These workloads require different types of processing elements—CPUs, DPUs, and XPUs—working together as a coordinated system. In this environment, raw TOPS is no longer the defining metric. What matters is how efficiently useful work can be delivered at scale. Increasingly, this is measured in performance per watt and performance per total cost of ownership (TCO).

      The implication is profound: Optimizing a single chip in isolation is insufficient. The entire system—compute, memory, interconnect, packaging, power delivery, and cooling—must be co-designed and co-optimized.

      As a result, a new class of design philosophy is emerging—one that looks beyond the XPU and treats the entire AI compute stack as a single, integrated system. At MediaTek, this means working across multiple dimensions simultaneously:

      • Compute: Custom ASICs and XPUs tailored to specific workloads. Leveraging design-technology co-optimization (DTCO) to extract maximum performance and efficiency can result in gains equivalent to a “half-node” improvement without requiring a full process shrink.
      • Memory: Architecting hierarchies that balance bandwidth, capacity, and efficiency.
      • Interconnect: Scaling both within systems and across racks with high-bandwidth, energy-efficient fabrics.
      • Packaging: Leveraging advanced 2.5D and 3.5D integration to bring together heterogeneous components.
      • System Integration: Optimizing power delivery, thermals, and reliability at the rack level.

      The goal is to deliver deployable, production-scale AI infrastructure that maximizes tokens per watt and tokens per dollar. In this model, the rack becomes a strategic asset—one that can be tuned, optimized, and differentiated based on workload requirements and deployment constraints.

       

      Conclusion

      The history of computing has largely been one of steady, predictable progress. Moore’s law provided a reliable cadence, and the industry built an ecosystem around it. The rise of AI is rewriting that story.

      Compute demand is accelerating so fast that traditional scaling cannot keep pace. The result is a widening gap that cannot be closed by advances in silicon alone. Closing this gap requires a fundamentally different approach. Innovation is needed across entire systems, spanning architecture, packaging, interconnect, power, and software.

      Ultimately, AI’s future will not hinge on any single component, but on how well the entire system is designed, integrated, and optimized. And that is where the next chapter of computing innovation will be written.