This is the original report. Read the February 2026 update here → The 2026 AI Infrastructure Supercycle: What You Need to Know Now
I. Executive Summary
The period of 2024–2025 marks a fundamental paradigm shift in digital infrastructure, moving from traditional, multi-tenant data centers to dedicated, gigawatt-scale “AI Factories.” These new facilities are designed for a singular purpose: training and serving next-generation artificial intelligence. This report analyzes the technical, financial, and environmental dimensions of this new era, anchored by a deep analysis of xAI’s “Colossus” supercomputer.
Key findings of this analysis include:
-
The “AI Factory” Era: Traditional data centers, operating at 8–15 kW per rack, are being supplanted by AI-optimized facilities demanding 50 kW to over 140 kW per rack. This 10-fold increase in density has forced a complete architectural redesign of power, cooling, and networking.
-
xAI Colossus as Archetype: xAI’s 200,000-GPU Colossus facility exemplifies the new model. It demonstrates unprecedented deployment speed (122 days) by leveraging a controversial environmental trade-off: the use of unpermitted, on-site fossil fuel power generation to bypass grid limitations.
-
Convergent Technical Design: The extreme densities of AI accelerators have forced industry-wide convergence on two key technologies: direct-to-chip liquid cooling, which is now essential and offers up to 17% higher computational throughput, and AI-optimized Ethernet (e.g., NVIDIA Spectrum-X), which has proven its viability against InfiniBand at massive scale.
-
A Trillion-Dollar Arms Race: The market is defined by a financial arms race, with hyperscalers committing tens of billions in annual CapEx (e.g., Meta $60–65 billion, Microsoft $80 billion). This spending is bifurcating into two strategies: total dependency on NVIDIA (xAI) versus a dual-track “NVIDIA + Custom Silicon” hedge (Google’s TPUs, AWS’s Trainium, Meta’s MTIA).
-
The Power and Pollution Bottleneck: The primary constraint on AI growth is no longer just silicon, but power. The industry’s gigawatt-scale ambitions, such as Microsoft’s 5 GW “Project Stargate,” are outstripping grid capacity and forcing a direct confrontation with regulators and communities, as seen in the xAI facility’s environmental violations in Memphis.
II. Anchor Case Study: xAI’s Colossus (Phases 1 & 2)
The xAI Colossus supercomputer in Memphis, Tennessee, serves as the definitive archetype for the 2024–2025 AI factory. Its development reveals a new model where speed-to-market is the primary strategic driver, superseding all other considerations, including cost, efficiency, and environmental legality.
The “Built in 122 Days” Phenomenon: Project Timeline and Strategic Urgency
The Colossus project’s timeline is its most defining metric. xAI launched Phase 1 (“Colossus 1”), a 100,000-GPU cluster of NVIDIA H100s, in September 2024. This entire facility was constructed inside a repurposed 785,000-square-foot appliance factory in an unprecedented 122 days. For comparison, building a traditional data center of this scale typically takes years.
xAI immediately began an expansion, “Colossus 2,” which was announced in November 2024. This phase doubled the cluster’s size to 200,000 total GPUs by adding 100,000 of the next-generation NVIDIA H200 accelerators. This doubling was achieved in just 92 days. This velocity, backed by a $7 billion hardware estimate and a $6 billion funding round, demonstrates that capital is not the bottleneck; time-to-compute is the only metric that matters in the current AI arms race.
Deep Dive: The Supermicro Liquid-Cooled Rack Architecture
The core building block of Colossus is a custom-designed, liquid-cooled rack provided by Supermicro. This architecture is purpose-built for extreme density.
- Rack Composition: Each rack contains eight 4U Universal GPU servers.
- Server Composition: Each 4U server hosts eight NVIDIA H100 GPUs (on an NVIDIA HGX tray), resulting in a total of 64 GPUs per rack.
- Cooling Mechanism: The system is entirely liquid-cooled. Each rack integrates a 4U Coolant Distribution Unit (CDU) at its base, which contains redundant pumps. This CDU manages a closed-loop liquid system, feeding coolant through 1U manifolds to direct-to-chip (DTC) cold plates mounted on the GPUs and CPUs.
- Heat Rejection: The racks also employ rear-door heat exchangers, which use facility water loops to absorb waste heat, making the entire rack assembly “cooling neutral” to the surrounding data hall.
Deep Dive: The NVIDIA Spectrum-X Network Fabric
To connect 100,000 (and now 200,000) GPUs as a single, coherent training cluster, xAI deployed NVIDIA’s Spectrum-X Ethernet platform. This makes Colossus a flagship validation case for AI-optimized Ethernet at unprecedented scale.
The architecture provides a dedicated 400GbE NIC (an NVIDIA BlueField-3 SuperNIC) for every single GPU, plus an additional 400GbE NIC for the server’s CPU. This results in an aggregated 3.6 Tb/s of network bandwidth per 8-GPU server.
This fabric links all GPUs in a 3-tier Layer 3 Clos topology. It utilizes RDMA over Converged Ethernet (RoCE) and advanced congestion control protocols to manage traffic. This design achieves performance previously thought to be possible only with more expensive, proprietary InfiniBand fabrics. NVIDIA reports that the Colossus fabric, at 100,000-GPU scale, achieves 95% data throughput with zero packet loss or application latency degradation from flow collisions.
III. The Financial Realities of AI Supercomputing
The AI factory model is not just a technical shift; it is a financial one, demanding capital expenditure on an industrial scale. The economics of AI data centers now more closely resemble semiconductor fabrication plants or gigafactories than traditional IT infrastructure.
Capital Expenditure (CapEx) Deconstructed: The $8M–$12M per MW Benchmark
The cost of building an AI data center is now measured by its power capacity.
- Facility “Shell” Cost: Industry analysis shows that the cost to build the physical “shell” — the land, building, electrical, and cooling infrastructure — for an AI-ready facility now averages $8 million to $12 million per MW.
- Benchmark 100 MW Facility: A 100 MW facility shell, before any computers are installed, requires an upfront CapEx of $800 million to $1.2 billion. This cost is dominated by specialized electrical systems (40–45% of the total) and industrial-scale mechanical cooling systems (15–20%).
- IT Hardware Cost: Crucially, this facility cost excludes the IT hardware. A 100 MW facility, designed to house approximately 100,000 GPUs, requires an additional $2 billion to $4 billion in servers, networking, and accelerators.
The total investment to bring a single 100 MW AI factory online is therefore in the range of $3 billion to $5.2 billion.
The Trillion-Dollar Horizon: Projecting AI Infrastructure Investment (2025–2030)
On a global scale, the total capital investment required to meet AI demand by 2030 is projected to be between $5.2 trillion and $7.9 trillion. This staggering sum is broken down into three main categories, with the $5.2 trillion base case allocating:
- $3.1 trillion (60%) to “technology developers” (chips, servers, hardware)
- $1.3 trillion (25%) to “energizers” (power generation, transmission, cooling)
- $0.8 trillion (15%) to “builders” (land, construction)
Case Study Financials: The $7B Hardware Cost of Colossus
xAI’s Colossus serves as a concrete example of this financial model. The total hardware cost (GPUs, servers, networking) for the 200,000-GPU Colossus cluster is estimated at $7 billion. This figure is supported by xAI’s $6 billion funding round, which analysts noted was roughly sufficient for a 100k-GPU cluster, where GPUs (at ~$30,000 each) account for approximately half the total hardware cost.
Operational Expenditure (OpEx) Analysis: The Dominance of Power Costs
After the massive upfront CapEx, OpEx is relentlessly dominated by the cost of electricity, which accounts for 40–60% of all operational spending.
- A 1 GW (1,000 MW) AI data center, if run at full capacity, would consume 8.76 TWh of electricity annually, translating to a power bill of ~$350 million per year (at an average US rate of $40/MWh).
- For a 100k-GPU cluster like Colossus, which is estimated to draw ~150 MW, the annual electricity bill is projected at $120 million to $124 million.
The second-largest OpEx is the amortized hardware refresh. With a 3–5 year lifecycle for cutting-edge GPUs, this “cost of staying relevant” adds hundreds of millions of dollars in effective annual spending.
This financial model confirms that AI is an energy-intensive industrial business, and its primary operational cost is the industrial-scale consumption of power.
Table 1: Financial Breakdown of a Benchmark 100 MW AI Data Center (2025)
| Cost Category | Component | Estimated Cost (USD) |
|---|---|---|
| Capital Expenditure (CapEx) | Facility “Shell” (100 MW) — Land, Building, Electrical, Cooling | $800M – $1.2B |
| IT Hardware — ~100,000 GPUs, Servers, Networking | $2.0B – $4.0B+ | |
| Total Upfront CapEx | $2.8B – $5.2B+ | |
| Operational Expenditure (OpEx) | Power (Annual) — ~876 GWh consumption | $35M – $100M+ |
| Amortized Hardware Refresh (Annual) — 3–5 year IT lifecycle | $200M – $600M+ | |
| Maintenance, Staffing, etc. (Annual) | $17M – $38M+ | |
| Total Estimated Annual OpEx | $252M – $738M+ |
IV. Technical Architecture: The Blueprints of an AI Factory
The extreme financial and power metrics of AI factories are a direct result of their underlying technical architecture. The physics of cooling 100,000 processors and the logistics of networking them as one have forced the industry to abandon old designs and converge on a new, repeatable blueprint.
The Great Network Debate: Ethernet (RoCE / Spectrum-X) vs. InfiniBand
For years, a technical battle has raged over the ideal network fabric for large-scale AI.
- InfiniBand (IB): Historically dominated HPC and AI. As a proprietary, single-vendor (NVIDIA/Mellanox) solution, it offers superior, ultra-low latency (~1–2 μs) and native, hardware-offloaded RDMA. Its primary drawback is cost, with a 1.5× to 2.5× higher Total Cost of Ownership (TCO) than Ethernet.
- Ethernet (RoCE): The open-ecosystem alternative (Broadcom, NVIDIA, Arista) with a much lower TCO. It was historically seen as inferior, with higher latency (~5–10 μs) and a “lossy” reputation for RDMA over Converged Ethernet (RoCE) traffic.
The 2024–2025 period marks the definitive inflection point where AI-optimized Ethernet has closed the gap. By integrating adaptive routing, advanced congestion control (like DCQCN), and in-network collectives, modern Ethernet switches (like NVIDIA’s Spectrum-X and Broadcom’s Tomahawk) can now deliver 1.5–2.5 μs latency and over 95% throughput.
Meta first proved RoCE’s viability at a 24,000-GPU scale. However, xAI’s successful deployment of Spectrum-X for the 100,000-GPU Colossus serves as the definitive industry validation. Ethernet is now the dominant, cost-effective, and high-performance choice for hyperscale AI.
Table 2: Network Fabric Showdown — InfiniBand vs. AI-Optimized Ethernet (RoCE)
| Metric | InfiniBand | AI-Optimized Ethernet (Spectrum-X / RoCE) |
|---|---|---|
| Latency | ~1–2 μs (native) | ~1.5–2.5 μs (tuned) |
| Cost (TCO) | 1.5× – 2.5× (Higher) | 1.0× (Baseline) |
| Ecosystem | Proprietary (NVIDIA) | Open (Broadcom, NVIDIA, Arista, etc.) |
| Key Feature | Native RDMA | RoCE + Advanced Congestion Control (DCQCN) |
| Flagship Deployment | Older HPC/AI Clusters | xAI Colossus (100k+ GPUs) |
Cooling the Core: The Obsolescence of Air and Dominance of Direct-to-Chip Liquid Cooling
The second major technical challenge is thermal. Traditional air cooling has a physical limit of approximately 50 kW of heat rejection per rack. This is no longer viable.
AI-optimized racks, packed with 64 or more GPUs, now regularly exceed 80 kW to 120 kW. NVIDIA’s next-generation GB200 NVL72 racks, which will power future AI factories, are rated at 130–140 kW per rack. At these densities, air cooling is physically incapable of removing the heat.
As a result, the industry has converged on Direct-to-Chip (DTC) liquid cooling as the new standard. This is not merely a necessity; it is a performance and efficiency optimization. A 2025 Supermicro study comparing identical AI systems found that the DTC liquid-cooled version delivered:
- 17% higher computational throughput due to superior thermal headroom
- 1.4% faster real-world AI training times
- 1 kW (16%) average power savings per 8-GPU node
Driven by these clear TCO and performance benefits, the market penetration of liquid cooling in AI data centers is surging from 14% in 2024 to a projected 33% in 2025.
Table 3: Technical Comparison — Air vs. Direct-to-Chip (DTC) Liquid Cooling
| Metric | Traditional Air Cooling | DTC Liquid Cooling |
|---|---|---|
| Thermal Limit (Per Rack) | ~50 kW | >140 kW |
| Computational Throughput | Baseline | +17% |
| Real-World AI Training | Baseline | 1.4% Faster |
| Power Consumption (Per Node) | Baseline | −1 kW (16% Savings) |
| Power Usage Effectiveness (PUE) | 1.4 – 1.6 | 1.1 – 1.2 |
| AI Data Center Penetration (2025) | (Legacy) | 33% |
V. The AI Factory Arms Race: Strategic Analyses of Key Players
The AI infrastructure market is not monolithic. It has fractured into a “Compute Cold War” with two clear factions: those betting everything on NVIDIA, and the established hyperscalers executing a long-term “pincer move” by developing their own custom silicon.
Category 1: The “All-In on NVIDIA” Players
This faction prioritizes speed-to-market and maximum performance by leveraging NVIDIA’s end-to-end platform, creating a massive dependency in the process.
- xAI (Colossus): The purest example. With 200,000 H100/H200 GPUs, xAI has made a pure bet on NVIDIA’s ecosystem, from chips to the Spectrum-X network.
- Tesla (Cortex): A separate entity, Tesla built its “Cortex” cluster at Giga Texas with 50,000 NVIDIA H100 GPUs to train its Full Self-Driving (FSD) and Optimus robot models. This cluster is set to expand from 130 MW to a massive 500 MW by 2026. (Tesla is a hybrid player, as it also develops its own Dojo D1 chip.)
- CoreWeave (Specialized Cloud): A “pure-play” GPU cloud provider that acts as a merchant arms dealer. It scaled to 250,000+ NVIDIA GPUs across 32 sites (360 MW total) by 2024, serving clients that include Microsoft.
Category 2: The “Custom Silicon” Hyperscalers (The Pincer Move)
This faction, composed of the incumbent cloud giants, is playing a longer, more resilient game. They continue to buy NVIDIA GPUs by the hundreds of thousands for flexibility, but are simultaneously executing a pincer movement by investing billions to build their own custom AI chips. This dual-track strategy provides supply chain control, workload-specific TCO advantages, and long-term leverage.
Meta:
- NVIDIA Strategy: A massive buyer, aiming for a fleet of 1.3 million GPUs by the end of 2025.
- CapEx: A staggering $60–65 billion planned for 2024–2025, with plans to bring ~1 GW of new compute online in 2025 alone.
- Flagship Project: A 2 GW+, 4-million-square-foot “AI supercompute” campus in Louisiana.
- Custom Silicon: Actively deploying its own MTIA (Meta Training and Inference Accelerator) — a 90W chip optimized for Meta’s core recommendation workloads, offloading them from power-hungry GPUs.
Microsoft (Azure/OpenAI):
- NVIDIA Strategy: Enormous investment of $80 billion in AI data centers planned for 2025 alone.
- Flagship Project: “Project Stargate,” a reported $100 billion, 5 GW AI campus to be built by 2028, primarily for OpenAI.
- Custom Silicon: Developing its own “Maia” AI accelerator alongside “Azure Boost” DPUs to control the full stack.
Google (Cloud/DeepMind):
- NVIDIA Strategy: A major NVIDIA partner, but its custom silicon strategy is the most mature.
- Custom Silicon (TPU): In a massive 2025 deal, Google committed to providing its partner Anthropic with 1 million TPU chips by 2026. This single cluster is valued at “tens of billions of dollars” and will represent over a gigawatt of dedicated capacity.
Amazon Web Services (AWS):
- NVIDIA Strategy: A major partner, offering H100s on its cloud.
- Custom Silicon (Trainium): AWS’s “Project Rainier,” an $11 billion campus in Indiana, is not just a plan — it is already running 500,000 of AWS’s own Trainium2 chips to train Anthropic’s Claude models. AWS plans to scale this to over 1 million Trainium chips by the end of 2025. This proves a 30–40% price-performance advantage over GPUs for this workload.
Table 4: Comparative Analysis of Major AI Data Centers (2024–2025)
| Player | Project | Compute Accelerators | Scale (Power) | Network Fabric | CapEx / Investment | Strategy |
|---|---|---|---|---|---|---|
| xAI | Colossus | 200k NVIDIA H100/H200 | ~150–300 MW | NVIDIA Spectrum-X Ethernet | $7B (Hardware) | NVIDIA All-In (Speed) |
| Tesla | Cortex | 50k H100 + 20k Dojo → 500 MW | 130 MW → 500 MW | NVIDIA (N/A) | $1B+ | Hybrid (NVIDIA + Custom) |
| Meta | LA Campus | 1.3M GPUs (Fleet) + MTIA | ~1 GW (New 2025); 2 GW+ Site | InfiniBand / Ethernet | $60–65B (Annual) | Custom Silicon Hedge |
| Microsoft | Stargate | (NVIDIA-based) + Maia | 5 GW (Planned) | NVIDIA (N/A) | $100B (Planned) / $80B (Annual) | Custom Silicon Hedge |
| Anthropic Cluster | 1M TPUs (by 2026) | >1 GW | Custom ICI / Ethernet | ”Tens of Billions” | Custom Silicon (TPU) | |
| AWS | Project Rainier | 500k → 1M+ Trainium2 | >500 MW (Est.) | NeuronLink / EFA | $11B (Indiana Site) | Custom Silicon (Trainium) |
VI. The Unavoidable Externality: Environmental and Community Impact
The AI arms race is built on an unpriced environmental externality. The “speed-at-all-costs” strategy is in direct conflict with environmental law and community health.
The Gigawatt Problem: Grid Strain and the Search for Power
The AI boom is severely stressing the US power grid. Data center electricity consumption is projected to explode from 4.5% of total US demand in 2023 to 10% within the next four years. This sudden, massive demand is forcing utilities to delay the retirement of coal plants and build new natural gas plants, directly jeopardizing climate goals.
This power deficit is the primary challenge for projects like Microsoft’s 5 GW Stargate, which would require the power equivalent of an entire city. Securing that power source, not the $100 billion in capital, is the true bottleneck.
Case Study: Environmental Justice at xAI Memphis
The xAI Colossus project in Memphis is a stark case study of this conflict. The 122-day build-out was only possible because xAI chose to bypass the multi-year queue for a utility grid connection.
- The Core Conflict: To get power immediately, xAI installed 33 or more on-site methane gas turbines, each the size of a semi-trailer, to generate its own power.
- The Legality: The facility operated for months without the required air pollution permits. It held a permit for only 15 turbines but installed over 30.
- The Pollutants: These turbines emit planet-warming nitrogen oxides (NOx) and poisonous formaldehyde around the clock. This has likely made the xAI facility the largest industrial source of smog-forming pollutants in Memphis, with the potential to increase city-wide smog by 30–60%.
- The “Sacrifice Zone”: This facility was built in South Memphis, a predominantly Black community that already suffers from the highest asthma-related hospitalization rates in Tennessee and cancer rates four times the national average. The community is an existing “sacrifice zone” that already contains 17 other toxic release facilities.
- The Response: The Southern Environmental Law Center (SELC) and the NAACP, citing violations of the Clean Air Act, have signaled their intent to sue xAI for its unpermitted operations.
This case demonstrates that the power bottleneck is not just a technical challenge but a political and social one, which has, in this instance, been “solved” by offloading the environmental cost onto a vulnerable community.
Water Consumption and Emerging Sustainable Solutions
While xAI’s air pollution is the acute crisis, water consumption for cooling is the chronic one. xAI’s planned water usage of over 5 million gallons per day prompted the company to plan an $80 million on-site wastewater recycling plant.
The industry is slowly responding to these pressures. Google is investing in geothermal power, and Microsoft is exploring restarting nuclear power plants. AWS’s Project Rainier campus in Indiana boasts a 40% improvement in water usage efficiency. Concurrently, the technical shift to DTC liquid cooling, as pushed by NVIDIA’s Blackwell platform, promises up to 300× water efficiency gains by eliminating the need for evaporative cooling towers.
VII. Strategic Insights and Future Trajectory (2026–2030)
The 2024–2025 period has defined the new era of AI infrastructure. The strategic landscape for the remainder of the decade will be shaped by three key bottlenecks, a core strategic trilemma, and the endurance of the architectural shifts identified.
Key Strategic Bottlenecks
-
Power: This is the undisputed #1 bottleneck. The future of AI is no longer limited by who has the most GPUs, but by who can power them, legally and sustainably. The xAI Memphis case proves that regulatory and community battles over power will become a primary obstacle.
-
HBM (High-Bandwidth Memory): The supply of HBM (from SK hynix, Samsung) is a critical component bottleneck that constrains the production of the advanced GPUs (like H200 and Blackwell) that AI factories demand.
-
Permitting: The 122-day “Colossus” deployment model is likely dead. The legal and environmental fallout in Memphis and the regulatory hurdles facing multi-gigawatt projects like Stargate signal that permitting for power, air, and water will become a multi-year hurdle, re-introducing friction into the AI arms race.
The Trilemma: Balancing Speed-to-Market vs. Cost vs. Sustainability
The analysis reveals a core strategic trilemma. No market participant has successfully optimized all three:
- xAI prioritized Speed (122 days) above all, sacrificing Sustainability (unpermitted gas turbines) and accepting high Cost (total NVIDIA dependency).
- Meta, Google, and AWS prioritize long-term Cost (TCO) via their custom silicon hedges, but this requires a slower, multi-year strategy. They are being forced to address Sustainability to protect their public brands and mitigate regulatory risk.
- Sustainability remains the unsolved variable, largely unpriced and treated as an externality.
Concluding Analysis
The AI-optimized data center has been established as a new industrial category, distinct from traditional IT. Its emergence is governed by a new set of rules:
-
The Technical Blueprint is Solved: The industry has converged on a standard, repeatable AI factory design: racks of DTC liquid-cooled accelerators (60 kW+) networked by AI-optimized, high-radix Ethernet fabrics.
-
The “Compute Cold War” is Entrenched: The market is now defined by the “NVIDIA vs. Custom Silicon” strategic divide. This “pincer move” by the hyperscalers will be the dominant market dynamic for the next five years, as it offers the only viable hedge against a single-supplier dependency.
-
The Existential Challenge is Energy: The AI industry’s greatest existential challenge is its own energy and environmental footprint. The gigawatt-scale ambition of the industry is fundamentally in conflict with a carbon-constrained world and aging power grids. The solution will require AI companies to evolve from being just energy consumers to becoming energy producers and innovators, abandoning the “move fast and break things” model that has, in the case of Colossus, already proven to be toxic.
This is the first report in an ongoing quarterly series tracking the AI infrastructure buildout. Read the February 2026 update: The 2026 AI Infrastructure Supercycle: What You Need to Know Now. To receive future updates, connect with me on LinkedIn.
Works Cited
- Liquid Cooling to Scale in AI Data Centers, Penetration to Surpass 30% in 2025 — TechPowerUp, accessed November 8, 2025
- The cost of compute power: A $7 trillion race — McKinsey, accessed November 8, 2025
- Colossus | xAI — accessed November 8, 2025
- Comprehensive literature review on AI infrastructure (internal research)
- A billionaire, an AI supercomputer, toxic emissions and a Memphis community — Tennessee Lookout, accessed November 8, 2025
- Groups appeal permit for xAI’s South Memphis data center — SELC, accessed November 8, 2025
- Comparison of Air-Cooled versus Liquid-Cooled GPU Systems — Supermicro, accessed November 8, 2025
- InfiniBand vs Ethernet for AI Clusters in 2025 — Vitex Tech, accessed November 8, 2025
- NVIDIA Ethernet Networking Accelerates World’s Largest AI Cluster — NVIDIA, accessed November 8, 2025
- Our next generation Meta Training and Inference Accelerator — Meta AI, accessed November 8, 2025
- AWS: How 500,000 Trainium2 Chips Power Project Rainier — Data Centre Magazine, accessed November 8, 2025
- Expanding our use of Google Cloud TPUs and Services — Anthropic, accessed November 8, 2025
- Can We Build a Five Gigawatt Data Center? — Asterisk, accessed November 8, 2025
- The AI Boom Is Stressing the Grid — NRDC, accessed November 8, 2025
- The Colossus Supercomputer: Elon Musk’s Drive Toward Data Center AI Domination — Data Center Frontier, accessed November 8, 2025
- Accelerate Everything — Inside the 100K GPU xAI Cluster — Supermicro, accessed November 8, 2025
- Colossus (supercomputer) — Wikipedia, accessed November 8, 2025
- Elon Musk’s xAI raises $6 billion to build more powerful AI supercomputers — Tom’s Hardware, accessed November 8, 2025
- Deconstructing the Data Center: a Look at the Cost Structure — Alpha Matica, accessed November 8, 2025
- Elon Musk Prepares to Double xAI Supercomputer to 200,000 Nvidia GPUs — PCMag, accessed November 8, 2025
- InfiniBand vs. Ethernet: Choosing the Right Network Fabric for AI Clusters — Arc Compute, accessed November 8, 2025
- Elon Musk shows off Cortex AI supercluster — Tom’s Hardware, accessed November 8, 2025
- Tesla Reveals CORTEX Supercomputer Details — YouTube, accessed November 8, 2025
- Meta plans $60-65bn capex on AI data center boom — Data Center Dynamics, accessed November 8, 2025
- Musk’s xAI explores another massive methane gas turbine installation — SELC, accessed November 8, 2025
- Inside the Memphis Chamber of Commerce’s Push for Elon Musk’s xAI Data Center — ProPublica, accessed November 8, 2025
- Inside Memphis’ Battle Against Elon Musk’s xAI Data Center — Time Magazine, accessed November 8, 2025
- Elon Musk’s xAI threatened with lawsuit over air pollution from Memphis data center — NAACP, accessed November 8, 2025
About the Author
Carlos Granier is a Tech Founder, CTO, and AI Strategist with 25 years of experience building at the intersection of technology and business. He co-founded Pongalo, one of the first US Hispanic OTT platforms, and built a YouTube MCN to 200M+ monthly views. He now helps founders and executives implement AI as practical infrastructure. Based in Miami, Florida.
Let's Connect
If you want to hire me or get in touch about something or just to say hi, reach out on social media or send me an email.
- X (Twitter) /
- Threads /
- Instagram /
- GitHub /
- LinkedIn / 📧