New P5en instances on Amazon EC2 equipped with EFAv3 networking and NVIDIA H200 Tensor Core GPUs. EC2 P5en instances can be purchased with EC2 Capacity Blocks for ML, On Demand, and Savings Plan.
Driven by custom 4th generation Intel Xeon Scalable processors with an all-core turbo frequency of 3.2 GHz (max core turbo frequency of 3.8 GHz) and NVIDIA H200 Tensor Core GPUs, AWS is announcing today the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P5en instances, which are exclusively available on AWS. Performance for machine learning (ML) training and inference applications is improved by these processors’ 50% increased memory bandwidth and up to four times throughput between CPU and GPU with PCIe Gen5.
P5en, which employs Nitro v5 and the third-generation Elastic Fabric Adapter (EFAv3) with a maximum speed of 3200 Gbps, exhibits a 35% reduction in latency compared to P5, which uses the earlier EFA and Nitro generations. Collective communications performance is enhanced for distributed training workloads, including real-time data processing, generative AI, deep learning, and high-performance computing (HPC) applications.
The following details apply to P5en instances:
Instance size | vCPUs | Memory (GiB) | GPUs (H200) | Network bandwidth (Gbps) | GPU Peer to peer (GB/s) | Instance storage (TB) | EBS bandwidth (Gbps) |
p5en.48xlarge | 192 | 2048 | 8 | 3200 | 900 | 8 x 3.84 | 100 |
On September 9, AWS unveiled Amazon EC2 P5e instances, which are equipped with two TiB of system memory, 30 TB of local NVMe storage, eight NVIDIA H200 GPUs with 1128 GB of high speed GPU memory, and third-generation AMD EPYC processors. With EFAv2, these instances offer an aggregate network capacity of up to 3,200 Gbps. They also enable GPUDirect RDMA, which eliminates delay and improves scale-out performance by avoiding the CPU for internode communication.
By lowering the inference and network latency even more, P5en instances can boost overall efficiency in a variety of GPU-accelerated processes. For those of you who use local storage for caching model weights, P5en instances optimize inference latency performance by up to two times and Amazon Elastic Block Store (Amazon EBS) bandwidth by up to 25% as compared to P5 instances.
Particularly for tasks requiring frequent data exchanges or huge datasets, the data transmission between CPUs and GPUs can be time-consuming. Model training, fine-tuning, and running inference for complex large language models (LLMs) and multimodal foundation models (FMs), as well as memory-intensive HPC applications like simulations, pharmaceutical discovery, weather forecasting, and financial modeling, can be made even faster with PCIe Gen 5, which offers up to four times the bandwidth between CPU and GPU when compared to P5e and P5e instances.
How to begin using Amazon EC2 P5en instances
With EC2 Capacity Blocks for ML, On Demand, and Savings Plan purchasing options, you can utilize EC2 P5en instances that are accessible in the American West (Oregon), Asia Pacific (Tokyo), and US East (Ohio) AWS Regions.
The use of P5en instances that have the option of capacity reservation. In the US East (Ohio) AWS Region, select Capacity Reservations on the Amazon EC2 console to reserve your EC2 Capacity Blocks.
Next, select Purchase Capacity Blocks for ML. Next, select your total capacity and indicate the duration of your EC2 Capacity Block requirements for p5en.48xlarge instances. Reservations for EC2 Capacity Blocks can be made for a total of 1, 14, 21, or 28 days. It is possible to buy EC2 Capacity Blocks up to eight weeks in advance.
When you choose Find Capacity Blocks, AWS provides the most affordable solution that fits your requirements within the given time frame. After looking over the EC2 Capacity Blocks details, tags, and total cost, select Purchase.
You can now successfully schedule your EC2 Capacity Block. Upon purchasing, the entire cost of an EC2 Capacity Block is paid in advance and remains constant. Within 12 hours of your EC2 Capacity Blocks order, the money will be charged to your account.
AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDKs can be used to operate instances within the Capacity Block you have purchased.
To get the most out of EFAv3, use this sample AWS CLI command to run 16 P5en instances. Eight private IP addresses are included in this configuration, which offers up to 800 Gbps of IP networking capacity and up to 3200 Gbps of EFA networking bandwidth:
You can utilize AWS Deep Learning AMIs (DLAMI) to support EC2 P5en instances when launching P5en instances. The architecture and resources offered by DLAMI enable ML researchers and practitioners to rapidly create distributed, scalable, and secure ML applications in preconfigured contexts.
AWS Deep Learning Containers can run containerized machine learning applications on P5en machines using Amazon ECS or EKS libraries.
For fast access to large datasets, Amazon S3 offers almost unlimited inexpensive storage or up to 30 TB of local NVMe SSD storage. Amazon FSx for Lustre file systems in P5en instances may access data at hundreds of gigabytes per second (GB/s) and millions of input/output operations per second (IOPS) for large-scale deep learning and HPC applications.
Now accessible
The US East (Ohio), US West (Oregon), and Asia Pacific (Tokyo) AWS Regions, as well as the US East (Atlanta) Local Zone us-east-1-atl-2a, now provide Amazon EC2 P5en instances via EC2 Capacity Blocks for ML, On Demand, and Savings Plan purchase options.
Amazon EC2 Capacity Blocks for ML pricing
You may reserve precisely the quantity of accelerator capacity required to perform your machine learning tasks with Amazon EC2 Capacity Blocks for ML. The cost of EC2 Capacity Blocks includes both an operating system fee and a reservation fee.
When you make the reservation, the reservation money is paid in advance. EC2 Capacity Block supply and demand patterns are used to update reservation pricing on a regular basis. An upgrade to the present rates is planned for January 1, 2025. Even if your Capacity Block is planned to begin after the price change, it is still paid at the current rate at the time of purchase. The table below displays the current reservation rates.
Instance Size | Hourly rate by operating system | ||||
Linux | Red Hat Enterprise Linux (RHEL) | RHEL with HA | SLES | Ubuntu Pro | |
p5.48xlarge | $0.000 USD | $1.8432 USD | $3.8592 USD | $0.125 USD | $0.336 USD |
p5e.48xlarge | $0.000 USD | $1.8432 USD | $3.8592 USD | $0.125 USD | $0.336 USD |
p5en.48xlarge | $0.000 USD | $0.130 USD | $0.165 USD | $0.125 USD | $0.336 USD |
p4d.24xlarge | $0.000 USD | $1.0368 USD | $2.3808 USD | $0.125 USD | $0.168 USD |
trn1.32xlarge | $0.000 USD | $1.2288 USD | $2.5728 USD | $0.125 USD | $0.224 USD |
See the pricing page for Amazon EC2 for additional details.
Try using the Amazon EC2 console to run Amazon EC2 P5en instances.