If AI is Centralized Today, It Is Not A Law of Nature
Author(s): Jan Olsen Originally published on Towards AI. How the evolution of computing hardware is reopening the path toward decentralized intelligence — and why we must organize now Muir Woods National Monument is part of California’s Golden Gate National Recreation Area, north of San Francisco. It’s known for its towering old-growth redwood trees. Trails wind among the trees to Cathedral Grove and Bohemian Grove, and along Redwood Creek. The Ben Johnson and Dipsea trails climb a hillside for views of the treetops, the Pacific Ocean and Mount Tamalpais in adjacent Mount Tamalpais State Park. Matthew Dillon from Hollywood, CA, USA — Muir Woods National Monument 🅭 CC BY 2.0 1. If AI Is Centralized Today, It Is Not a Law of Nature2. Centralization Is an Architectural Outcome, Not a Fundamental Rule3. From Training Spectacle to Inference Reality4. Embedded Intelligence Reshapes the Topology of Power5. Synchronization Was the Real Bottleneck: Hardware Evolution Reopens the Path to Decentralized Intelligence6. Hardware Breaks the Lock, Not Software7. The Groq–NVIDIA Moment: Inference Becomes the Battlefield8. The Real Risk Is Not Who Trains the Largest Model9. This Window Will Not Remain Open Indefinitely10. What Must Emerge Is a Real Network, Not Another Platform11. Decentralization Will Not Be Proclaimed — It Will Emerge 1. If AI is centralized today, it is not a law of nature Since the very beginning of artificial intelligence as a computer science project, one belief has followed it like a shadow: intelligence, at scale, must be centralized. From early academic machines to modern industrial deployments, AI has almost always been conceived as something that lives inside large systems, owned and operated by powerful centralized entities with the resources required to build and sustain them. The idea that AI must be centralized is not new; what is new is that it is now presented as a given — almost immutable. Today, that belief has hardened into something close to dogma. Vast hyperscale data centers, continent-scale power contracts — often colocated with, or built around, major energy generation infrastructure — and computing complexes owned and controlled by an ever-shrinking group of actors are no longer framed as pragmatic engineering decisions, but as technical fatalisms reserved for a select few. This narrative is frequently presented as neutral, even scientific. In reality, it is neither neutral nor a matter of fate. What we are looking at here is a snapshot: a frozen image of a specific architectural moment, mistakenly take n for a definitive trajectory. And that assumption is already beginning to crack — and we should celebrate it. Meta’s Hyperscale Data Center in El Paso, Texas — The Icon of Centralized AI Infrastructure October 15, 2025 https://datacenters.atmeta.com/2025/10/hello-el-paso/ 2. Centralization Is an Architectural Outcome, Not a Fundamental Rule Centralization, in the context of AI, did not emerge because intelligence demands it. It emerged because a specific set of architectural decisions made it the most efficient option at a given point in time. Tight synchronization, ultra-low-latency interconnects, and dense compute clusters favor proximity. When every parameter update must converge immediately, distance becomes the enemy, and centralization becomes the obvious answer. For the current generation of large-scale training workloads, this logic is sound. Massive GPU clusters, specialized networking fabrics, and carefully engineered power and cooling environments reduce coordination costs and maximize throughput. No serious engineer disputes that this approach works — or why it was chosen. But architecture is not nature. It reflects constraints, trade-offs, and optimization targets that evolve. What looks inevitable under one set of assumptions often dissolves once those assumptions shift. Centralization, in this case, is not a law imposed by physics or mathematics; it is the by-product of hardware designed for a narrow class of workloads, at a specific moment in the history of computing. The mistake is not centralization itself. The mistake is treating an architectural solution as a permanent condition — as if the stack that optimized yesterday’s problems must also define tomorrow’s systems. Architectural contrast between centralized AI training and distributed inference — synchronization versus asynchrony, density versus distribution. Diagram generated using a centralized AI image model (Google Gemini). 3. From Training Spectacle to Inference Reality For the past decade, artificial intelligence has largely been framed as a training problem. Bigger models, larger datasets, more GPUs, longer runs. Training became the spectacle — the visible, expensive, headline-grabbing part of the pipeline. It is where benchmarks are set, papers are published, and record-breaking funding rounds — measured in billions, sometimes tens or hundreds of billions of dollars — are justified. That focus was understandable, but it distorted our perception of where AI actually lives. In practice, AI does not live in training. It lives in inference. Once a model is trained, it is executed millions or billions of times, embedded into services, products, devices, and decision systems. Measured in deployed silicon — the actual chips in production — as well as cumulative energy consumption, and increasingly water usage for cooling, inference already outweighs training by a wide margin. This dimension is often overlooked, even as it becomes one of the most constraining factors of hyperscale infrastructure. And that margin continues to grow. This shift is not ideological; it is structural. Training is episodic and centralized by necessity. Inference is continuous and distributed by usage. Training happens a few times, in a few places. Inference happens everywhere, all the time. The economic center of gravity moves accordingly — away from rare, spectacular events and toward persistent, operational workloads. Hardware roadmaps reflect this reality more clearly than marketing narratives ever could. Accelerators are increasingly optimized for low-latency execution, energy efficiency, and predictable throughput rather than raw peak FLOPS. Techniques such as quantization, model distillation, and sparsity — the ability to activate only a subset of a model at runtime — are no longer research curiosities; they are operational requirements driven by inference at scale. The consequence is subtle, but profound. As inference becomes the dominant form of computation, the architectural assumptions that once justified extreme centralization begin to weaken. Latency tolerance increases. Synchronization […]