Wednesday, February 12, 2025

Introducing Google Cloud A3 High VMs’ smaller machine types

Introducing smaller Cloud A3 High VMs machine variants. These days, more and more businesses are doing inference on their AI/ML models on GPUs. Organisations want greater granularity in the number of GPUs in their virtual machines (VMs) to maintain low costs while scaling with user demand, as the number of GPUs required to support a single inference job changes.

Cloud A3 High VMs with NVIDIA H100 80GB GPUs may be used with a variety of widely accessible machine types, including 1, 2, 4, and 8 GPUs.

Accessing smaller H100 machine types

The fully managed Vertex AI, Google Kubernetes Engine (GKE) as nodes, and Google Compute Engine as virtual machines (VMs) offer all Cloud A3 High VMs machine kinds.

Dynamic Workload Scheduler (DWS) Flex Start mode and Spot VMs are supported for the 1, 2, and 4 A3 High GPU machine types.

A3 VMs portfolio powered by NVIDIA H100 GPUs
Machine Type(GPUs count, GPU memory)Vertex AIGoogle Kubernetes Engine
Google Compute Engine
a3-highgpu-1g 
(1 GPUs, 80 GB)
Vertex AI Model Garden and Online Prediction (Spot)Vertex AI Training (Spot, DWS Flex Start mode)SpotDWS Flex Start mode
a3-highgpu-2g 
(2 GPUs, 160 GB)
Vertex AI Model Garden and Online Prediction
(On-demand, Spot)Vertex AI Training (Spot, DWS Flex Start mode)
a3-highgpu-4g 
(4 GPUs, 320 GB)
a3-highgpu-8g
(8 GPUs, 640 GB)
Vertex AI Online Prediction (On-Demand, Spot)Vertex AI Training (On-demand, Spot, DWS Flex Start mode )On-demandSpotDWS Flex Start modeDWS Calendar mode
a3-megagpu-8g
(8 GPUs, 640 GB)

solely accessible via Model Garden’s own capabilities.

Google Kubernetes Engine

GKE has been the preferred platform for executing web apps and microservices for almost ten years. It now offers an open, highly scalable, and reasonably priced platform for AI workloads and training. GKE Autopilot is an excellent option for inference workloads; just deliver your workload and let Google handle the rest. It also lowers operating costs and provides workload-level SLAs. Both the GKE Standard and GKE Autopilot modes of operation are compatible with the 1, 2, and 4 Cloud A3 High VMs GPU machine types.

Here are two examples of how to use Spot VMs and Dynamic Workload Scheduler Flex Start mode to create node pools in your GKE cluster using a3-highgpu-1g machine types.

Using Spot VMs with GKE

Here’s how to use the gcloud API to request and launch a3-highgpu-1g Spot VM on GKE.

gcloud container node-pools create NODEPOOL_NAME \
  --cluster CLUSTER_NAME \
  --region CLUSTER_REGION \
  --node-locations GPU_ZONE1,GPU_ZONE2 \
  --machine-type a3-highgpu-1g \
  --accelerator type=nvidia-h100-80gb,count=1,gpu-driver-version=latest \
  --image-type COS_CONTAINERD \
  --spot

Using Dynamic Workload Scheduler Flex Start mode with GKE

Here’s how to use GKE’s Dynamic Workload Scheduler Flex Start mode to request a3-highgpu-1g.

gcloud beta container node-pools create NODEPOOL_NAME \
  --cluster CLUSTER_NAME \
  --region CLUSTER_REGION \
  --node-locations GPU_ZONE1,GPU_ZONE2 \
  --enable-queued-provisioning \
  --machine-type=a3-highgpu-1g \
  --accelerator type=nvidia-h100-80gb,count=1,gpu-driver-version=latest \
  --enable-autoscaling  \
  --num-nodes=0   \
  --total-max-nodes TOTAL_MAX_NODES  \
  --location-policy=ANY  \
  --reservation-affinity=none  \
  --no-enable-autorepair

This generates a GKE node pool with zero nodes and Dynamic Workload Scheduler enabled. After that, you may use Dynamic Workload Scheduler to manage your workloads.

Vertex AI

Google Cloud’s Vertex AI is a single, fully managed platform for developing and using generative and predictive AI. Customers of Model Garden may deploy hundreds of open models with high performance and cost-effectiveness using the new 1, 2, and 4 Cloud A3 High VMs GPU machine types.

What Google Cloud’s clients are saying

The backend AI-assisted software development offering is powered by Google Kubernetes Engine. Real-time code help models’ latency has been lowered by 36% when compared to A2 machine types with smaller Cloud A3 High VMs machine types, greatly enhancing user experience.

Start now

Their aim at Google Cloud is to provide you the freedom you require to execute inference for your Artificial Intelligence and ML models efficiently and affordably. The granularity required to expand with user demand while controlling costs is made possible by the availability of Cloud A3 High VMs with NVIDIA H100 80GB GPUs in smaller machine types.

Thota nithya
Thota nithya
Thota Nithya has been writing Cloud Computing articles for govindhtech from APR 2023. She was a science graduate. She was an enthusiast of cloud computing.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes