The NVIDIA® H100 Tensor Core GPU offers exceptional performance, scalability, and security for various workloads. It is equipped with advanced features such as the dedicated Transformer Engine and NVLink Switch System, enabling the acceleration of exascale workloads and trillion-parameter language models. With the H100, large language models (LLMs) can achieve a remarkable 30X speed increase compared to the previous generation, making it a leading solution for conversational AI.
The H100 NVL version, designed for PCIe-based servers, utilizes the Transformer Engine, NVLink, and 188GB HBM3 memory to deliver optimal performance and scalability across data centers. It allows LLMs with up to 175 billion parameters to be accessible in mainstream environments. Compared to NVIDIA DGX™ A100 systems, servers equipped with the H100 NVL GPUs can achieve up to 12X performance improvement for the GPT-175B model while maintaining low latency, even in power-constrained data center setups.
For enterprise AI applications, the H100 for mainstream servers comes with a five-year subscription that includes enterprise support for the NVIDIA AI Enterprise software suite. This suite provides high-performance AI frameworks and tools, simplifying AI adoption and enabling the development of H100-accelerated AI workflows like AI chatbots, recommendation engines, and vision AI, among others.
Supercharging Large Language Model Inference:
It offers up to 30X higher AI inference performance for large models, such as the Megatron chatbot with 530 billion parameters.
Transformational AI Training:
With fourth-generation Tensor Cores and a Transformer Engine featuring FP8 precision, the H100 delivers up to 4X faster training performance compared to the previous generation, specifically for GPT-3 (175B) models.
Accelerated Data Analytics:
The H100, along with its high memory bandwidth, NVLink, and NVSwitch compatibility, empowers accelerated servers to handle data analytics workloads efficiently, providing up to 3TB/s of memory bandwidth per GPU.
Enterprise-Ready Utilization:
The H100 incorporates second-generation Multi-Instance GPU (MIG) technology, allowing each GPU to be securely partitioned into as many as seven separate instances. This feature enables optimized utilization and secure multi-tenant usage in cloud service provider environments.
Built-In Confidential Computing:
The H100 is the world’s first accelerator with built-in confidential computing capabilities, thanks to NVIDIA Confidential Computing. It creates a trusted execution environment (TEE) on the GPU, securing and isolating workloads while preserving the GPU’s acceleration capabilities. This feature ensures data and application confidentiality and integrity.
Specifications
Specification | NVIDIA H100 |
Product SKU | P1010 SKU 200 NVPN: 699-21010-0200-xxx |
Total board power | PCIe 16-pin 450 W or 600 W power mode: 350 W default350 W maximum200 W minimum PCIe 16-pin 300 W power mode: 310 W default310 W maximum200 W minimum |
Thermal solution | Passive |
Mechanical form factor | Full-height, full-length (FHFL) 10.5”, dual-slot |
GPU SKU | GH100-200 |
PCI Device IDs | Device ID: 0x2331 Vendor ID: 0x10DE Sub-Vendor ID: 0x10DE Sub-System ID: 0x1626 |
GPU clocks | Base: 1,125 MHz Boost: 1,755 MHz |
Performance states | P0 |
VBIOS | EEPROM size: 8 Mbit UEFI: Supported |
PCI Express interface | PCI Express Gen5 x16; Gen5 x8; Gen4 x16 Lane and polarity reversal supported |
Multi-Instance GPU (MIG) | Supported (seven instances) |
Secure Boot (CEC) | Supported |
Power connectors and headers | One PCIe 16-pin auxiliary power connector |
Weight | Board: 1200g grams (excluding bracket, extenders, and bridges) NVLink bridge: 20.5 grams per bridge (x 3 bridges) Bracket with screws: 20 grams Enhanced straight extender: 35 grams Long offset extender: 48 grams Straight extender: 32 grams |
Memory Specifications | 1,593 MHz,HBM2e, 80 GB, 2,000 GB/s, 5,120 bits |
Software Specifications | Supported — 32 VF (virtual functions) BAR address (physical function): BAR0: 16 MiB1 BAR1: 128 GiB1 BAR3: 32 MiB1 BAR address (virtual function): BAR0: 5 MiB, (256 KiB per VF)1 BAR1: 80 GiB, 64-bit (4 GiB per VF)1 BAR3: 640 MiB, 64-bit (32 MiB per VF)1 Driver support: Linux: R520 or later Windows: R520 or later |
NVIDIA® CUDA® support | x86: CUDA 11.8 or later Arm: CUDA 12.0 or later |
Virtual GPU software support | Supports vGPU 15.0 or later: NVIDIA Virtual Compute Server Edition |
IPMI FRU EEPROM I2C address & SMBus (8-bit address) | 0x50 (7-bit), 0xA0 (8-bit) & 0x9E (write), 0x9F (read) |
The H100 also excels in exascale high-performance computing (HPC), offering superior double-precision Tensor Core performance, TF32 precision, and DPX instructions. It delivers 60 teraflops of FP64 computing power for HPC applications and achieves seven times the performance of the A100 and 40 times the speed of CPUs on dynamic programming algorithms like Smith-Waterman.
The NVIDIA H100 Tensor Core GPU provides outstanding performance, scalability, and security for a wide range of workloads, including large language model inference, AI training, HPC applications, and accelerated data analytics. Its advanced features and compatibility with NVIDIA AI Enterprise software make it a powerful solution for enterprises seeking to adopt AI and accelerate their workflows.