Sunday, June 16, 2024

GB200 NVL72: Tool for Next-Gen AI and Scientific Discovery


In a rack-scale configuration, GB200 NVL72 links 36 Grace CPUs with 72 Blackwell GPUs. With a 72-GPU NVLink domain that functions as a single huge GPU and provides 30X quicker real-time trillion-parameter LLM inference, the GB200 NVL72 is a liquid-cooled, rack-scale solution.

A crucial part of the NVIDIA GB200 NVL72 is the GB200 Grace Blackwell Superchip, which uses the NVIDIA NVLink-C2C interconnect to link two powerful NVIDIA Blackwell Tensor Core GPUs and an NVIDIA Grace CPU.

Instantaneous LLM Deduction

Together with fifth-generation NVIDIA NVLink, the GB200 NVL72 offers state-of-the-art features including a second-generation Transformer Engine that powers FP4 AI and provides 30X faster real-time LLM inference performance for trillion-parameter language models. A new generation of Tensor Cores, which bring new microscaling formats and offer high accuracy and higher throughput, makes this progress possible. Furthermore, the GB200 NVL72 overcomes communication obstacles by combining liquid cooling and NVLink to build a single, enormous 72-GPU rack.

Large-Scale Instruction

A faster second-generation Transformer Engine with FP8 precision is included in the GB200 NVL72, allowing for an amazing 4X faster training time for big language models at scale. The fifth-generation NVLink, which offers NVIDIA Magnum IOTM software, InfiniBand networking, and 1.8 terabytes per second (TB/s) of GPU-to-GPU connectivity, complements this innovation.

Infrastructure with Low Energy Use

Data centres with liquid-cooled GB200 NVL72 racks use less energy and have a smaller carbon footprint. Large NVLink domain architectures benefit from liquid cooling’s ability to boost computation density, minimise floor space consumption, and enable high-bandwidth, low-latency GPU connection. When compared to NVIDIA H100 air-cooled infrastructure, the GB200 uses less water and offers 25X higher performance at the same power.

Data Entry

For businesses, databases are essential for managing, processing, and evaluating massive amounts of data. GB200 leverages the NVIDIA Blackwell architecture’s high-bandwidth memory performance, NVLink-C2C, and dedicated decompression engines to expedite critical database queries by 18X as compared to CPU and provide a 5X better total cost of ownership.

Although there are many advantages, there can be significant computational and resource costs associated with training and deploying big models. Widespread implementation will depend on computationally, financially, and energy-efficient systems that are designed to provide real-time inference. One such system that is capable of the job is the new NVIDIA GB200 NVL72.

Let’s look at the Mixture of Experts (MoE) models as an example. By utilising pipeline and model parallelism, these models facilitate the training of thousands of GPUs while distributing the computational load across numerous specialists. increasing the effectiveness of the system.

But GPU clusters may be able to make the technical issue manageable thanks to a new level of parallel processing, high-speed memory, and high-performance connections. This is accomplished by the NVIDIA GB200 NVL72 rack-scale architecture, which NVIDIA describes in more detail in the post that follows.

GB200 NVIDIA NVL36 and NVL72

In NVLink domains, the GB200 supports 36 and 72 GPUs. Based on the NVLink Switch System and the MGX reference design, each rack houses 18 computing nodes. With 18 solitary GB200 compute nodes and 36 GPUs in a single rack, it is available in the GB200 NVL36 configuration. With 72 GPUs in one rack and 18 dual GB200 compute nodes, or 72 GPUs in two racks with 18 single GB200 compute nodes, is how the GB200 NVL72 is arranged.

For ease of use, the GB200 NVL72 tightly packs and links the GPUs using a copper cable cartridge. Additionally, it features a liquid cooling system design, which results in 25 times less energy and cost usage.

NVIDIA GB200 NVL72 Features

Architecture by Blackwell

With unmatched speed, efficiency, and scalability, the NVIDIA Blackwell architecture ushers in a new era of computing with revolutionary advances in accelerated computing.

NVIDIA Grace Processor

An innovative processor for AI, cloud, and HPC applications running in contemporary data centres is the NVIDIA Grace CPU. It offers exceptional speed and memory bandwidth at a 2X energy efficiency compared to the top server processors available today.

Fifth-Stage NVIDIA NVLink Technology

For exascale computing and trillion-parameter AI models to reach their full potential, quick, smooth communication between each GPU in a server cluster is necessary. A scale-up link, the fifth iteration of NVLink unlocks faster performance for trillion- and multi-trillion-parameter AI models.

Graphics Processing Unit

As the foundation for distributed AI model training and generative AI performance, the data center’s network is essential to the development and performance of AI. For the best possible application performance, NVIDIA Quantum-X800 InfiniBand, NVIDIA Spectrum-X800 Ethernet, and NVIDIA BlueField-3 DPUs provide effective scalability over hundreds or thousands of Blackwell GPUs.

Nvidia GB200 NVL72 price

The NVIDIA DGX GB200 NVL72 is priced accordingly, being a high-end device aimed at academic institutions and major enterprises. One estimate places the price of a fully loaded system with 72 GB200 Superchips at approximately $3 million USD.

Here’s the reason it costs so much:

Abundant processing power: Equipped with 72 GPUs, it achieves remarkable performance in the 1.44 exaFLOPs of FP4 exaflop range.

Advanced hardware: It includes a liquid cooling system, a unique NVLink switch system for high-speed networking, and 13.5 TB of HBM3e memory.

It is challenging to find a publicly published price for the DGX GB200 NVL72 due to its specialised market. The $3 million estimate, nevertheless, is a reasonable approximation.

NVIDIA GB200 NVL72 Specs

 GB200 NVL72GB200 Grace Blackwell Superchip
Configuration36 Grace CPU : 72 Blackwell GPUs1 Grace CPU : 2 Blackwell GPU
FP4 Tensor Core1,440 PFLOPS40 PFLOPS
FP8/FP6 Tensor Core720 PFLOPS20 PFLOPS
INT8 Tensor Core720 POPS20 POPS
FP16/BF16 Tensor Core360 PFLOPS10 PFLOPS
TF32 Tensor Core180 PFLOPS5 PFLOPS
FP64 Tensor Core3,240 TFLOPS90 TFLOPS
GPU Memory | BandwidthUp to 13.5 TB HBM3e | 576 TB/sUp to 384 GB HBM3e | 16 TB/s
NVLink Bandwidth130TB/s3.6TB/s
CPU Core Count2,592 Arm Neoverse V2 cores72 Arm Neoverse V2 cores
CPU Memory | BandwidthUp to 17 TB LPDDR5X | Up to 18.4 TB/sUp to 480GB LPDDR5X | Up to 512 GB/s
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.


Please enter your comment!
Please enter your name here

Recent Posts

Popular Post Would you like to receive notifications on latest updates? No Yes