The NVIDIA GB300 NVL72 designed for the era of artificial intelligence.
Overview
Developed for AI Reasoning Capabilities
With its fully liquid-cooled, rack-scale design, the NVIDIA GB300 NVL72 combines 36 Arm-based NVIDIA Grace CPUs and 72 NVIDIA Blackwell Ultra GPUs into a single platform that is optimized for test-time scaling inference. When compared to the NVIDIA Hopper platform, AI factories equipped with the GB300 NVL72 and NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet paired with ConnectX-8 SuperNICS offer a 50x higher output for reasoning model inference.
Performance
AI Factories Reaching Unprecedented Performance Levels
Take use of the NVIDIA GB300 NVL72 platform’s cutting-edge AI reasoning capabilities. The GB300 NVL72 offers a remarkable 5x increase in throughput (TPS per megawatt (MW)) and a 10x increase in user response (TPS per user) as compared to Hopper. When combined, these developments result in an astounding 50x increase in total AI factory output.
Features
AI Reasoning Inference
The computation required to attain maximum throughput and quality of service is increased by test-time scaling and AI reasoning. Compared to NVIDIA Blackwell GPUs, NVIDIA Blackwell Ultra’s Tensor Cores are enhanced with 1.5 times more AI compute floating-point operations per second (FLOPS) and 2 times the attention-layer acceleration.
288 GB of HBM3e
Maximum throughput performance and higher batch sizes are made possible by larger memory capacities. When combined with more AI computation, NVIDIA Blackwell Ultra GPUs provide 1.5x bigger HBM3e memory, increasing AI reasoning throughput for the longest context lengths.
NVIDIA Blackwell Architecture
The NVIDIA Blackwell architecture powers a new age of unmatched speed, efficiency, and scale by delivering ground-breaking advances in accelerated computing.
NVIDIA ConnectX-8 SuperNIC
Two ConnectX-8 devices are housed in the input/output (IO) module of the NVIDIA ConnectX-8 SuperNIC, which gives each GPU in the NVIDIA GB300 NVL72 800 gigabits per second (Gb/s) of network access. Peak AI workload efficiency is made possible by this, which offers best-in-class remote direct-memory access (RDMA) capabilities with either the NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet networking systems.
NVIDIA Grace CPU
A revolutionary processor made for the workloads of today’s data centres is the NVIDIA Grace CPU. It offers twice the energy efficiency of the top server processors of today along with exceptional performance and memory bandwidth.
Fifth-Generation NVIDIA NVLink
For accelerated computing to reach its full potential, all GPUs must communicate with one another seamlessly. AI reasoning models can achieve faster performance to the fifth-generation NVIDIA NVLink scale-up interconnect.
NVIDIA GB300 Grace Blackwell Ultra Superchip
Four NVIDIA Blackwell Ultra GPUs, two Grace CPUs, and four ConnectX-8 SuperNICs make up the NVIDIA GB300 Grace Blackwell Ultra Superchip, which serves as a foundation for the NVIDIA GB300 NVL72 rack-scale system. A massive GPU designed for the era of artificial intelligence reasoning is created by combining 18 superchips using NVIDIA NVLink Switch technology and NVIDIA BlueField-3 DPUs.
Release date
In the second half of 2025, partners are anticipated to offer the NVIDIA Blackwell Ultra devices, including the GB300 NVL72.
Pricing
The price range for each NVIDIA GB300 NVL72 AI server is between $3.7 million and $4 million.
Conclusion
The GB300 NVL72 is intended to support AI workloads for significant research institutes and hyperscalers. When using DeepSeek’s R1 model, it can process 1,000 tokens per second and combines 36 Grace CPUs with 72 Blackwell Ultra GPUs in a rack-scale configuration. It enhances the functionality of AI models and is intended for AI reasoning.
NVIDIA GB300 NVL72 Specs
Specification | Details |
---|---|
GPU Configuration | 72 NVIDIA Blackwell Ultra GPUs |
CPU Configuration | 36 NVIDIA Grace CPUs |
NVLink Bandwidth | 130 TB/s |
Fast Memory | Up to 40 TB |
GPU Memory | Up to 21 TB |
GPU Memory Bandwidth | Up to 576 TB/s |
CPU Memory | Up to 18 TB SOCAMM with LPDDR5X |
CPU Memory Bandwidth | Up to 14.3 TB/s |
CPU Core Count | 2,592 Arm Neoverse V2 cores |
FP4 Tensor Core | 1,400 PFLOPS |
FP8/FP6 Tensor Core | 720 PFLOPS |
INT8 Tensor Core | 23 PFLOPS |
FP16/BF16 Tensor Core | 360 PFLOPS |
TF32 Tensor Core | 180 PFLOPS |
FP32 | 6 PFLOPS |
FP64 / FP64 Tensor Core | 100 TFLOPS |