Thursday, July 4, 2024

The Future of AI Technology with NVIDIA H100 GPUs

We’re all familiar with NVIDIA and the AI “gold mine” that has just taken the world by storm. In the center of it all are Team Green’s H100 AI GPUs, which are the most sought-after piece of AI hardware at the moment, with everyone attempting to get their hands on one to fuel their AI demands.

The NVIDIA H100 GPU is the best chip for AI right now, and everyone wants more of them.

This post isn’t really newsworthy, but it does inform readers on the current state of the AI market and how businesses are revolving around the H100 GPUs for their “future.”

Once we get into the meat of the article, a little summary is in order. So, at the beginning of 2022, everything was proceeding as usual. However, with the entrance of November, a breakthrough application known as ChatGPT” surfaced, laying the groundwork for the AI frenzy. While “ChatGPT” cannot be considered the originator of the AI boom, it can be considered a catalyst. As a result, competitors such as Microsoft and Google have been forced into an AI race to produce generative AI products.

Where does NVIDIA fit into this, you might ask? The backbone of generative AI is extensive LLM (Large Language Model) training, and NVIDIA AI GPUs come in handy here. We won’t delve into technical details or factual information because it’s boring and uninteresting to read. If you’re looking for specifics, we’ve included a table below that lists every AI GPU release from NVIDIA, dating back to Tesla models.

NVIDIA HPC / AI GPUs

NVIDIA TESLA GRAPHICS CARDNVIDIA H100 (SMX5)NVIDIA H100 (PCIE)NVIDIA A100 (SXM4)NVIDIA A100 (PCIE4)TESLA V100S (PCIE)TESLA V100 (SXM2)TESLA P100 (SXM2)TESLA P100
(PCI-EXPRESS)
TESLA M40
(PCI-EXPRESS)
TESLA K40
(PCI-EXPRESS)
GPUGH100 (Hopper)GH100 (Hopper)GA100 (Ampere)GA100 (Ampere)GV100 (Volta)GV100 (Volta)GP100 (Pascal)GP100 (Pascal)GM200 (Maxwell)GK110 (Kepler)
Process Node4nm4nm7nm7nm12nm12nm16nm16nm28nm28nm
Transistors80 Billion80 Billion54.2 Billion54.2 Billion21.1 Billion21.1 Billion15.3 Billion15.3 Billion8 Billion7.1 Billion
GPU Die Size814mm2814mm2826mm2826mm2815mm2815mm2610 mm2610 mm2601 mm2551 mm2
SMs132114108108808056562415
TPCs66575454404028282415
FP32 CUDA Cores Per SM128128646464646464128192
FP64 CUDA Cores / SM128128323232323232464
FP32 CUDA Cores168961459269126912512051203584358430722880
FP64 CUDA Cores168961459234563456256025601792179296960
Tensor Cores528456432432640640N/AN/AN/AN/A
Texture Units528456432432320320224224192240
Boost ClockTBDTBD1410 MHz1410 MHz1601 MHz1530 MHz1480 MHz1329MHz1114 MHz875 MHz
TOPs (DNN/AI)3958 TOPs3200 TOPs1248 TOPs
2496 TOPs with Sparsity
1248 TOPs
2496 TOPs with Sparsity
130 TOPs125 TOPsN/AN/AN/AN/A
FP16 Compute1979 TFLOPs1600 TFLOPs312 TFLOPs
624 TFLOPs with Sparsity
312 TFLOPs
624 TFLOPs with Sparsity
32.8 TFLOPs30.4 TFLOPs21.2 TFLOPs18.7 TFLOPsN/AN/A
FP32 Compute67 TFLOPs800 TFLOPs156 TFLOPs
(19.5 TFLOPs standard)
156 TFLOPs
(19.5 TFLOPs standard)
16.4 TFLOPs15.7 TFLOPs10.6 TFLOPs10.0 TFLOPs6.8 TFLOPs5.04 TFLOPs
FP64 Compute34 TFLOPs48 TFLOPs19.5 TFLOPs
(9.7 TFLOPs standard)
19.5 TFLOPs
(9.7 TFLOPs standard)
8.2 TFLOPs7.80 TFLOPs5.30 TFLOPs4.7 TFLOPs0.2 TFLOPs1.68 TFLOPs
Memory Interface5120-bit HBM35120-bit HBM2e6144-bit HBM2e6144-bit HBM2e4096-bit HBM24096-bit HBM24096-bit HBM24096-bit HBM2384-bit GDDR5384-bit GDDR5
Memory SizeUp To 80 GB HBM3 @ 3.0 GbpsUp To 80 GB HBM2e @ 2.0 GbpsUp To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 1.6 TB/s
Up To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 2.0 TB/s
16 GB HBM2 @ 1134 GB/s16 GB HBM2 @ 900 GB/s16 GB HBM2 @ 732 GB/s16 GB HBM2 @ 732 GB/s
12 GB HBM2 @ 549 GB/s
24 GB GDDR5 @ 288 GB/s12 GB GDDR5 @ 288 GB/s
L2 Cache Size51200 KB51200 KB40960 KB40960 KB6144 KB6144 KB4096 KB4096 KB3072 KB1536 KB
TDP700W350W400W250W250W300W300W250W250W235W
image credit to wccftech

The subject of why the H100s remains unanswered. So, we’re getting there. The NVIDIA H100 is the company’s highest-end product, with massive computing capabilities. One could argue that the increase in performance results in higher costs, but firms tend to order in large quantities, and “performance per watt” is the goal here. The Hopper “H100” outperforms the A100 in 16-bit inference and training performance by 3.5 times, making it the obvious choice.

announcing NVDIA H100 GPU
image credit to wccftech

Therefore, we’re hoping that the superiority of the H100 GPU is clear here. Moving on to our next topic, why is there a scarcity? The answer incorporates multiple factors, the first of which is the massive number of H100s required to train a single model. Astonishingly, OpenAI’s GPT-4 AI model required 10,000 to 25,000 A100 GPUs (H100s were not available at the time).

Advanced artificial intelligence businesses such as Inflection AI and CoreWeave have acquired massive amounts to H100s, totaling billions of dollars. This demonstrates that even to train a basic-to-decent AI model, a single organization requires massive volumes, resulting in massive demand.

GPU
image credit to wccftech

If NVIDIA’s method is questioned, one can respond, “NVIDIA could increase production to meet demand.” It is much easier to say this than it is to do it. Unlike gaming GPUs, NVIDIA AI GPUs necessitate sophisticated processes, with the majority of production delegated to Taiwanese semiconductor titan TSMC. TSMC is NVIDIA’s sole supplier of the AI GPU, overseeing all processes from wafer purchase through sophisticated packaging.

The H100 GPUs are built on TSMC’s 4N technology, which is a redesigned variant of the 5nm series. NVIDIA is the most important customer for this method because Apple previously used it for its A15 Bionic chipset, which has since been superseded by the A16 Bionic. The creation of HBM memory is the most difficult of all critical phases since it requires advanced equipment that is only used by a few manufacturers.

NVIDIA H100
image credit to wccftech

HBM vendors include SK Hynix, Micron, and Samsung, although TSMC has limited its suppliers, the identities of whom we are unclear. Aside from HBM, TSMC is also struggling to maintain CoWoS (Chip-on-Wafer-on-Substrate) capacity, a 2.5D packaging technique, and a critical stage in the development of H100s. Because TSMC is unable to meet NVIDIA’s demand, order backlogs have reached record heights, with delivery delayed until December.

We have left out numerous facts because going into detail would detract from our core goal of informing the ordinary user about the situation. While we do not believe the scarcity will be alleviated very soon, it is expected to worsen. However, with AMD’s decision to consolidate its position in the AI sector, we may witness a shift in the landscape.

According to DigiTimes, “TSMC appears to be particularly optimistic about demand for AMD’s upcoming Instinct MI300 , claiming that it will account for half of Nvidia’s total output of CoWoS-packaged chips.”” It is possible that the burden will be distributed across organizations. Still, given Team Green’s greedy policies in the past, something like this would necessitate a significant offering from AMD.

To summarize our discussion, NVIDIA’s H100 GPUs are driving the AI craze to new heights, which is why they are surrounded by such a frenzy. We hoped to conclude our discussion by providing readers with a general overview of the situation. GPU Utilis deserves credit for the inspiration for this article; be sure to read their report as well.

agarapuramesh
agarapurameshhttps://govindhtech.com
Agarapu Ramesh was founder of the Govindhtech and Computer Hardware enthusiast. He interested in writing Technews articles. Working as an Editor of Govindhtech for one Year and previously working as a Computer Assembling Technician in G Traders from 2018 in India. His Education Qualification MSc.
RELATED ARTICLES

16 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes