As AI becomes more prevalent, the demand for high-performance, scalable infrastructure will only increase dramatically. Microsoft Azure is deploying new cloud-based AI-supercomputing clusters built with Azure ND H200 v5 series virtual machines (VMs) since its clients depend on Azure AI infrastructure to generate creative AI-driven solutions. These virtual machines (VMs) are now widely accessible and designed to manage increasingly sophisticated advanced AI workloads, ranging from generative inferencing to basic model training. Due to their size, effectiveness, and improved performance, customers and Microsoft AI services like Azure Machine Learning and Azure OpenAI Service are already adopting its ND H200 v5 virtual machines (VMs).
ND-H200-v5 size series
The Azure ND H200 v5 series virtual machine is engineered to provide outstanding performance for tasks related to artificial intelligence (AI) and high-performance computing (HPC). These virtual machines (VMs) make use of the NVIDIA H200 Tensor Core GPU, which provides 76% more High Bandwidth Memory than the H100 GPUs, in order to achieve better performance on cutting-edge Generative AI models. Larger datasets and more complicated models may be handled by the H200 GPU, which is perfect for generative AI and scientific computing. It has 141 GB of high-speed memory and 4.8 TB/s of memory bandwidth.
Eight NVIDIA H200 Tensor Core GPUs coupled by 900 GB/s NVLink are the foundation of the Azure ND H200 v5 series, along with a single virtual machine. Deployments based on the ND H200 v5 can support hundreds of GPUs and 3.2Tb/s of connection bandwidth per virtual machine. Every GPU in the virtual machine has a dedicated 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand connection that is not affected by topology. These connections, which feature GPUDirect RDMA, are set up automatically between virtual machines that are part of the same virtual machine scale set.
Many AI, ML, and analytics technologies (such as TensorFlow, Pytorch, Caffe, RAPIDS, and other frameworks) that allow GPU acceleration “out-of-the-box” do exceptionally well with these instances. Furthermore, a wide range of current AI and HPC tools that are based on NVIDIA’s NCCL communication libraries for smooth GPU clustering are compatible with the scale-out InfiniBand interface. The Azure ND H200 v5 virtual machines are equipped with eight NVIDIA H200 Tensor Core GPUs and are designed using Microsoft’s systems methodology to optimize efficiency and performance. In particular, they bridge the gap caused by GPUs’ exponential growth in raw processing power relative to associated memory and memory bandwidth.
Azure ND H100 v5 VMs Vs Azure ND H200 v5 VMs
Compared to the previous generation of Azure ND H100 v5 VMs, the Azure ND H200 v5 series VMs offer a 76% increase in High Bandwidth Memory (HBM) to 141GB and a 43% improvement to 4.8 TB/s in HBM bandwidth. GPUs can now access model parameters more quickly with the boost in HBM bandwidth, which also helps to lower total application latency a crucial statistic for real-time applications like interactive agents. Additionally, by enabling users to avoid the complexity of running dispersed jobs across several VMs, the ND H200 V5 VMs improve performance by supporting larger, more complicated Large Language Models (LLMs) within the memory of a single VM.
H200 supercomputing clusters’ design also makes it possible to manage GPU memory for model weights, batch sizes, and key-value cache more effectively. These factors all have an impact on the throughput, latency, and cost-effectiveness of workloads including LLM-based generative AI inference. In comparison to the ND H100 v5 series, the ND H200 v5 VM can support bigger batch sizes because of its increased HBM capacity, which improves GPU efficiency and throughput for inference workloads on both small language models (SLMs) and LLMs.
In preliminary testing, it found that for inference workloads using the LLAMA 3.1 405B model (with world size 8, input length 128, output length 8, and maximum batch sizes of 32 for H100 and 96 for H200), the throughput gain with Azure ND H200 v5 VMs was up to 35% higher than with the ND H100 v5 series.
To assist organizations in getting started right away, the ND H200 v5 VMs come pre-integrated with Azure Batch, Azure Kubernetes Service, Azure OpenAI Service, and Azure Machine Learning.
Host specifications
Part | Quantity Count Units | Specs SKU ID, Performance Units, etc. |
---|---|---|
Processor | 96 vCPUs | Intel Xeon (Sapphire Rapids) [x86-64] |
Memory | 1850 GiB | |
Local Storage | 1 Disk | 28000 GiB |
Remote Storage | 16Disks | |
Network | 8 NICs | |
Accelerators | 8 GPUs | Nvidia H200 GPU (141GB) |