Ping-Pong with Intel MPI Benchmark (IMB)
The latency experienced when sending a fixed-sized message between two ranks on separate virtual machines is measured by IMB Ping-Pong. When utilizing the HPC Rocky Linux 8 image instead of the GCP Rocky Linux 8 default image, they observed improvements of up to 15%.
Setting up benchmarks
- Two instances of h3-standard-88
- Intel OneAPI MPI library 2021.11.0 is the MPI library.
- Intel MPI Benchmarks 2019 Update 6 is the MPI benchmarks application.
- Environment variables for MPI:
- Processor List I_MPI_PIN_LIST=0
- SHM:ofi I_MPI_FABRICS=
- FI_PROVIDER = tcp
- Mpirun -n 2 -ppn 1 -bind-to core -hostfile is the command line. IMB-MPI1 Pingpong -iter 50000 -msglog 0:16
AllReduce – Intel MPI Benchmark (IMB) – single process per node
The collective latency across multiple ranks across virtual machines is measured by the IMB AllReduce benchmark. It uses the MPI_SUM operation to reduce a vector with a fixed length.
Initially, they display 1 PPN (process-per-node) result (1 MPI rank) on 8 VMs in order to isolate networking performance.
When comparing the HPC Rocky Linux 8 image to the default GCP Rocky Linux 8 image, they observed improvements of up to 35%.
Setting up benchmarks
- Eight times the h3-standard-88
- Every node has one process.
- Intel OneAPI MPI library 2021.11.0 is the MPI library.
- Intel MPI Benchmarks 2019 Update 6 is the MPI benchmarks application.
- Environment variables for MPI:
- SHM:ofi I_MPI_FABRICS=
- FI_PROVIDER = tcp
- 11 is I_MPI_ADJUST_ALLREDUCE.
- Mpirun -n 008 -ppn 01 -bind-to core -hostfile is the command line. IMB-MPI1 Allreduce -iter 50000 -npmin 008 -msglog 0:16
Benchmark Intel MPI (IMB) One process for each core (88 processes per node) in AllReduce
They display 88 PPN results with 1 thread/rank (704 ranks) and 88 MPI ranks/node.
When comparing the HPC Rocky Linux 8 image to the default GCP Rocky Linux 8 image for this test, they observed an improvement of up to 25%.
Setting up benchmarks
- Eight times the h3-standard-88
- 88 processes per node, or one process per core
- Intel OneAPI MPI library 2021.11.0 is the MPI library.
- Intel MPI Benchmarks 2019 Update 6 is the MPI benchmarks application.
- Environment variables for MPI:
- SHM:ofi I_MPI_FABRICS=
- FI_PROVIDER = tcp
- 11 is I_MPI_ADJUST_ALLREDUCE.
- pirun -n 704 -ppn 88 -bind-to core -hostfile is the command line. IMB-MPI1 Allreduce -iter 50000 -npmin 704 -msglog 0:16
Google is pleased to announce today the general availability of HPC Virtual Machine (VM) images for workloads involving high performance computing (HPC), primarily tightly coupled workloads like fluid dynamics, molecular modeling, and weather forecasting. These images are based on CentOS 7 and Rocky Linux 8.
They have made it simple to create an HPC-ready virtual machine instance using the HPC VM image by integrating their finest practices for HPC on Google Cloud, such as:
Virtual machines prepared for HPC right out of the box
Virtual machines prepared for HPC right out of the box For closely coupled HPC workloads, there’s no need to manually adjust performance, handle virtual machine reboots, or stay current with Google Cloud updates especially with their frequent releases of HPC VM images. As soon as tunings call for them, the HPC VM image will take care of the reboot process automatically.
Optimization of networking for closely coupled workloads
Networking best practices for workloads that are closely connected Applications that heavily rely on point-to-point and collective communications benefit from optimizations that lower latency for small messages.
Compute optimizations
These include adjustments to lessen system jitter, which improves scalability by ensuring consistent single-node performance.
Enhanced application compatibility
A high level of system interoperability is made possible by alignment with the node-level specifications of the Intel HPC platform specification.
Measuring performance with HPC benchmarks
They have evaluated the HPC VM images’ performance using the Intel MPI Benchmarks (IMB) against the GCP-optimized Rocky Linux 8 image and the default CentOS 7 image.
The following images served as the benchmarks.
Rocky Linux 8 for HPC
- hpc-rocky-linux-8-v20240126 is the image name.
- The cloud-hpc-image-public image project
GCP Rocky Linux 8 default
- Rocky-linux-8-optimized-gcp-v20240111 is the image name.
- Picture project: cloud-linux-rocky
In order to reduce network latency, each cluster of machines was deployed using compact placement with max_distance=1, which meant that all virtual machines were situated on hardware that was physically on the same rack
The HPC VM image and the Cloud HPC Toolkit
The Cloud HPC Toolkit, an open-source tool that makes it easier to deploy environments for a range of workloads, including machine learning, AI, and HPC, is where you can use the HPC VM image. Actually, the HPC VM image is used by default by the Toolkit blueprints and Slurm images based on CentOS 7 and Rocky Linux 8. The HPC VM image can be further customized with the help of the Cloud HPC Toolkit, allowing for the installation of additional software and configuration changes, thus increasing its utility.
It is possible to create and share blueprints for creating optimized and specialized images, increasing reproducibility while lowering setup time and effort, by using the Cloud HPC Toolkit to customize images based on the HPC VM Image.
Setting up an HPC-ready VM
Overview
In tightly coupled high performance computing (HPC) workloads, processes and VM instances communicate via MPI.. However, creating your own VM image that is optimized for MPI performance calls for additional maintenance time, familiarity with Google Cloud, and system knowledge. You can use the HPC VM image to quickly create virtual machine instances for your HPC workloads. A substitute is to use the H3 machine series to build virtual machines (VMs).
The HPC virtual machine image is designed for tightly coupled HPC workloads and is based on either CentOS 7.9 or Rocky Linux 8. It contains pre-configured network and kernel tuning parameters needed to build virtual machines (VM instances) with the best MPI performance on Google Cloud
These options can be used to create a virtual machine that is ready for HPC:
- Cloud CLI for Google
- Google Cloud interface. The image can be accessed in the console via Cloud Marketplace.
- Slurm, the workload manager from SchedMD, defaults to using the HPC VM image.
- Omnibond CloudyCluster defaults to using the HPC VM image.
Advantages
The advantages of the HPC VM image are as follows:
Virtual machines that are ready for HPC right out of the box
For closely coupled HPC workloads, there’s no need to manually adjust performance, handle virtual machine reboots, or keep up with Google Cloud updates.
Networking best practices for workloads that are closely connected
Applications that heavily rely on point-to-point and collective communications benefit from optimizations that lower latency for small messages.
Computation-related improvements for HPC tasks
The inclusion of jitter reduction optimizations increases the predictability of single-node high performance.
reliable and consistent performance. Application-level performance is consistent and repeatable with VM image standardization.
enhanced compatibility with applications. System interoperability is greatly enhanced by alignment with the node-level requirements of the Intel HPC platform specification.
Features
Collective tunings for Intel MPI
Compact placement policies were used to perform Intel MPI collective tunings on c2-standard-60 and c2d-standard-112 instances, which are included in the HPC VM image.
RPMs that are already installed
The RPM packages listed below are pre-installed on the HPC virtual machine image
Package group “Development Tools” contains the following: Lmod, dkms, htop, hwloc, hwloc-devel, kernel-devel, ltrace, libXt, nfs-utils, numptl, numptl-devel, papi, pciutils, pdsh, perf, redhat-lsb-core, redhat-lsb-cxx, rsh, screen, trace, wget, zsh