What Are Amazon EC2 P4 Instances?
For cloud-based HPC and ML training, Amazon Elastic Compute Cloud (EC2) P4d instances give outstanding performance. Due to NVIDIA A100 Tensor Core GPUs, EC2 P4d instances have industry-leading networking throughput and latency. 400 Gbps instance networking is supported by these instances. Compared to P3 and P3dn instances of the previous generation, P4d instances offer up to 60% cheaper training costs for machine learning models, along with an average of 2.5x greater performance for deep learning models.
EC2 P4d instances are set up on Amazon EC2 UltraClusters, which are hyperscale clusters that include high speed cloud computing, networking, and storage. One of the strongest supercomputers in the world, each EC2 UltraCluster can support your most demanding distributed HPC and multinode machine learning training tasks. You may easily add a few to hundreds of NVIDIA A100 GPUs to EC2 UltraClusters for your ML or HPC project.
Scientists, data scientists, and developers can train machine learning models for recommendation engines, object identification and classification, and natural language processing using EC2 P4d instances. It also supports HPC applications including financial modelling, seismic analysis, and pharmaceutical discovery. In contrast to on-premises systems, you may expand your infrastructure according to business requirements, access nearly infinite compute and storage capacity, and quickly and easily build up a tightly connected distributed HPC application or a multinode ML training project without incurring setup or maintenance fees.
Advantages
Shorten the duration of ML training from days to minutes
Each EC2 P4d instance offers, on average, 2.5 times greater DL performance than P3 instances to the most recent generation of NVIDIA A100 Tensor Core GPUs. EC2 UltraClusters of P4d instances provide access to supercomputing-class performance without any upfront fees or long-term commitments, assisting researchers, data scientists, and regular developers in handling their most demanding ML and HPC workloads. Productivity is increased by the shorter training period with P4d instances, allowing developers to concentrate on their primary goal of integrating ML intelligence into commercial applications.
Execute the most intricate multi-node machine learning training with great effectiveness
With EC2 UltraClusters of P4d instances, developers can easily scale to thousands of GPUs. Elastic Fabric Adapter (EFA), GPUDirect RDMA technology, and high-throughput, low-latency networking with support for 400 Gbps instance networking enable quick training of machine learning models employing scaleout/distributed methods. Low-latency GPU-to-GPU communication between P4d instances is made possible by GPUDirect RDMA technology, while EFA scales to thousands of GPUs using the NVIDIA Collective Communications Library (NCCL).
Reduce the infrastructure expenses for HPC and ML training
When compared to P3 instances, EC2 P4d instances offer ML model training costs that are up to 60% lower. Furthermore, P4d instances can be bought as Spot Instances. Utilising unused EC2 instance capacity, Spot Instances can drastically reduce your EC2 expenses by up to 90% when compared to On-Demand rates. Budgets can be shifted to include more ML intelligence into commercial applications because P4d instances are less expensive for ML training.
AWS services make it simple to get started and scale
Because they come with the necessary DL framework libraries and tools, AWS Deep Learning AMIs (DLAMIs) and Amazon Deep Learning Containers facilitate the deployment of P4d DL environments in a matter of minutes. These pictures also make it easier to incorporate your own libraries and tools.P4d instances support TensorFlow, PyTorch, and MXNet. Amazon SageMaker, Amazon EKS, Amazon ECS, AWS Batch, and AWS ParallelCluster manage P4d instances.
Features
Powered by NVIDIA A100 Tensor Core GPUs
For ML and HPC, NVIDIA A100 Tensor Core GPUs offer previously unheard-of scaled acceleration. The third-generation Tensor Cores in the NVIDIA A100 speed up time to market and insight for all precision workloads. Each A100 GPU has 40 GB HBM2 (in P4d instances) or 80 GB HBM2e (in P4de instances) of high-speed GPU memory and provides more than 2.5 times the computational performance of the V100 GPU from the previous generation.
Increased GPU RAM is especially advantageous for tasks involving training on sizable high-resolution data sets. Each GPU in the same instance may communicate with every other GPU at the same 600 GB/s bidirectional throughput and with single-hop latency to NVSwitch GPU interconnect throughput, which is used by NVIDIA A100 GPUs.
Networking with high performance
High-throughput networking between EC2 P4d instances and between a P4d instance and storage services like Amazon Simple Storage Service (Amazon S3) and FSx for Lustre is made possible by P4d instances’ 400 Gbps networking, which enables users to scale out distributed workloads like multinode training more effectively. AWS created EFA, a unique network interface, to assist in scaling HPC and machine learning applications across thousands of GPUs. EFA and NVIDIA GPUDirect RDMA are combined to provide low-latency GPU-to-GPU communication between servers with OS bypass, substantially lowering latency.
Low-latency, high-throughput storage
Use FSx for Lustre to access petabyte-scale high-throughput, low-latency storage, or use Amazon S3 for nearly limitless affordable storage at 400 Gbps speeds. Additionally, each P4d instance has 8 TB of NVMe-based SSD storage with a read performance of 16 GB/sec for workloads requiring quick access to huge datasets.
Based on the AWS Nitro Framework
The P4d instances are based on the AWS Nitro System, a comprehensive set of building blocks that reduces virtualisation overhead while delivering high performance, high availability, and high security by offloading many of the conventional virtualisation functions to specialised hardware and software.