Amazon EC2 P5 Instances With Boost Machine Learning & HPC

December 9, 2024

236

Top-performing GPU-based systems for HPC and deep learning applications

Why Amazon EC2 P5 Instances?

Amazon Elastic Compute Cloud (Amazon EC2) P5 instances with NVIDIA H100 Tensor Core GPUs and P5e and P5en instances with H200 GPUs perform best for deep learning (DL) and HPC workloads. They reduce machine learning model training costs by 40% and speed up solution time by 4x compared to previous Amazon EC2 P5 Instances. These examples facilitate quicker iterations and quicker time to market for your ideas.

Large language model (LLM) and diffusion models that drive the most sophisticated generative artificial intelligence (AI) applications can be trained and deployed using P5, P5e, and P5en instances. These applications include speech recognition, video and image production, code generation, and question answering. These instances can also be used to scale up demanding HPC applications for financial modelling, seismic research, pharmaceutical discovery, and weather forecasting.

With 2x more CPU performance, 2x more system memory, and 4x more local storage than previous-generation GPU-based instances, P5 and P5e instances supplement NVIDIA H100 and H200 Tensor Core GPUs to provide these cost savings and performance enhancements. P5en instances enable Gen5 PCIe between CPU and GPU by pairing high-performance Intel Sapphire Rapids CPUs with NVIDIA H200 Tensor Core GPUs.

Compared to P5e and P5 instances, P5en instances offer up to twice as much bandwidth between the CPU and GPU and have lower network latency, which enhances distributed training performance. Support for P5 and P5e instances allows for networking speeds of up to 3,200 Gbps with second-generation Elastic Fabric Adapters (EFA). Compared to P5, which uses the previous generation of EFA and Nitro, P5en, which employs the third generation of EFA with Nitro v5, has a latency improvement of up to 35%.

For distributed training workloads including deep learning, generative AI, real-time data processing, and high-performance computing (HPC) applications, this enhances collective communications performance. These instances are set up in Amazon EC2 P5 Instances, which allow scaling up to 20,000 H100 or H200 GPUs coupled by a petabit-scale nonblocking network, to provide large-scale computation at low latency. In EC2 UltraClusters, P5, P5e, and P5en instances can provide up to 20 exaflops of total computational capacity, which is comparable to a supercomputer.

Benefits of Amazon EC2 P5 Instances

Train 100B+ parameter models at scale

Instances of P5, P5e, and P5en may provide up to 4x the performance of earlier generation GPU-based EC2 instances and train ultra-large generative AI models at scale.

Reduce time to solution and iterate faster

Training times and time to solution are shortened from weeks to a few days with P5, P5e, and P5en examples. This speeds up iteration and accelerates your time to market.

Lower your DL and HPC infrastructure costs

DL training and HPC infrastructure expenses can be reduced by up to 40% with P5, P5e, and P5en instances when compared to earlier generations of GPU-based Amazon EC2 P5 Instances.

Run distributed training and HPC with exascale compute

EFA networking speeds of up to 3,200 Gbps are offered by P5, P5e, and P5en instances. These instances provide 20 exaflops of aggregate computing power and are set up in EC2 UltraClusters.

Features of Amazon EC2 P5 Instances

NVIDIA H100 and H200 Tensor Core GPUs

Up to 8 NVIDIA H100 GPUs and 640 GB of HBM3 GPU RAM are available per P5 instance. Up to 8 NVIDIA H200 GPUs and 1128 GB of HBM3e GPU memory are available per instance of P5e and P5en. Each GPU can communicate with every other GPU in the same instance with single-hop latency thanks to the two instances’ capability for up to 900 GB/s of NVSwitch GPU interconnect (a total of 3.6 TB/s bisectional bandwidth).

New transformer engine and DPX instructions

A new transformer engine in the NVIDIA H100 and H200 GPUs dynamically choose between FP8 and 16-bit computations and intelligently handles them. When compared to earlier A100 GPU generations, this functionality aids in providing quicker DL training speedups on LLMs. In comparison to A100 GPUs, NVIDIA H100 and H200 GPUs feature new DPX instructions that substantially speed up dynamic programming methods for HPC workloads.

Networking with high performance

EFA networking speeds of up to 3,200 Gbps are provided by P5, P5e, and P5en instances. In order to facilitate low-latency GPU-to-GPU communication between servers with operating system bypass, EFA is also combined with NVIDIA GPUDirect RDMA.

High-performance storage

Amazon FSx for Lustre file systems are supported by P5, P5e, and P5en instances, allowing you to access data at the hundreds of gigabytes per second throughput and millions of IOPS needed for large-scale DL and HPC workloads. For quick access to big datasets, each instance may additionally accommodate up to 30 TB of local NVMe SSD storage. Additionally, Amazon Simple Storage Service (Amazon S3) offers nearly limitless affordable storage.