Wednesday, April 2, 2025

GCP Dynamic Workload Scheduler For AI/ML Workloads

Dynamic Workload Scheduler: Improving AI/ML workload economics and resource access

Google Cloud is living in a fascinating time of innovation and change driven by Aritificial intelligence. AI Hypercomputer, a revolutionary architecture that uses an integrated system of AI-optimized hardware, software, and consumption models. AI Hypercomputer enables businesses worldwide to operate on the same state-of-the-art infrastructure that serves as the foundation for Google’s internal AI/ML research, development, training, and support.

However, efficient resource management is more important than ever due to the enormous demand for TPUs and NVIDIA GPUs.

In response, Google Cloud provides Dynamic Workload Scheduler, a brand-new, easy-to-use, and potent method for gaining access to GPUs and TPUs. Technical audiences are the target audience, which delves deeply into its definition, operation, and current applications.

What is Dynamic Workload Scheduler?

A task scheduling and resource management tool made for AI hypercomputers is called Dynamic Workload Scheduler. By scheduling all the necessary accelerators at once, Dynamic Workload Scheduler enhances your access to AI/ML resources, aids in cost optimisation, and can enhance the experience of workloads like training and fine-tuning activities. Dynamic Workload Scheduler offers Google Cloud users scheduling innovations from Google ML fleet, and it supports TPUs and NVIDIA GPUs. Many of your favourite Google Cloud AI/ML services, like Compute Engine Managed Instance Groups, Google Kubernetes Engine, Vertex AI, Batch, and more, are also integrated with Dynamic Workload Scheduler.

Two modes: Flex Start and Calendar

Two modes are introduced by Dynamic Workload Scheduler: Calendar mode, which offers excellent predictability on task start times, and Flex Start mode, which improves obtainability and optimises costs.

Flex Start mode: Efficient GPU and TPU access with better economics

Model refinement, experimentation, shorter training jobs, distillation, offline inference, and batch operations are all intended uses for Flex Start mode. When your workloads are ready to start, you may request GPU and TPU capacity using Flex Start mode.

You submit a GPU capacity request for your AI/ML projects using Dynamic Workload Scheduler in Flex Start mode by specifying the number of GPUs you want, the time, and the area you want. Your workloads may run continuously for the length of the capacity allocation because Dynamic Workload Scheduler intelligently persists the request and immediately supplies your virtual machines (VMs) as soon as the capacity becomes available. Capacity requests are supported by Dynamic Workload Scheduler for a maximum of seven days; there is no minimum time requirement. As little as a few minutes or hours might be requested for capacity; generally speaking, the scheduler can complete shorter requests faster than longer ones.

To free up resources and just pay for the real amount of labour used, you may easily terminate the virtual machines (VMs) if your training task ends early. Holding onto unused resources for future usage is no longer necessary.

Dynamic Workload Scheduler may be easily used with orchestrators like Kueue if your AI/ML workloads are running on GKE node pools. It comes with built-in support for popular machine learning frameworks including Ray, Kubeflow, Flux, PyTorch, and other training operators. The steps to enable this are as follows:

Step 1: Make a node pool and activate “enable-queued-provisioning.”

gcloud beta container node-pools create NODEPOOL_NAME 
--cluster=CLUSTER_NAME 
--region=CLUSTER_REGION 
--enable-queued-provisioning 
--accelerator type=GPU_TYPE,count=AMOUNT,gpu-driver-version=DRIVER_VERSION --enable-autoscaling --num-nodes=0 --total-max-nodes=NODES_MAX --reservation-affinity=none --no-enable-autorepair 
--impersonate-service-account SERVICE_ACCOUNT

Step 2: Label your GKE task when you create it so that Kueue and Dynamic Workload Scheduler know to execute it. Kueue takes care of the rest, handling the task start orchestration and automatically submitting a capacity request.

apiVersion: batch/v1
kind: Job
metadata:
  name: sample-job
  namespace: default
  labels:
    kueue.x-k8s.io/queue-name: dws-local-queue
spec:
  parallelism: 1
  completions: 1
  suspend: true
  template:
. . . . .

Calendar mode: Reserved start times for your AI workloads

Workloads for training and experimentation that require exact start timings and have a set length are accommodated by calendar mode. The future reservation possibilities that were revealed back in September are expanded by this mode.

You can request GPU capacity in blocks of a set period while using Calendar mode. It may be booked up to eight weeks in advance and will initially accommodate future reservations for stays of seven or fourteen days. Depending on availability, your reservation will be verified, and on the start date you have specified, the capacity will be supplied to your project. Your virtual machines will be able to use this capacity block by focussing on this reservation. The virtual machines will be shut down and the reservations will be erased after the conclusion of the specified period.

Step 1: Make a future reservation in calendar mode.

gcloud alpha compute future-reservations create my-calendarblock \
--zone=us-central1-a
--machine-type=a3-highgpu-8g
--vm_count=VM_COUNT
--start-time=START_TIME
--end-time=END_TIME
--auto-delete-auto-created-reservations \
--require-specific-reservation

Step 2: Use Compute Engine, Managed Instance Groups, or GKE APIs, if available, to run virtual machines (VMs) with the specified reservation affinity and indicate that the reservation be provided at the preferred start date.

Thota nithya
Thota nithya
Thota Nithya has been writing Cloud Computing articles for govindhtech from APR 2023. She was a science graduate. She was an enthusiast of cloud computing.
RELATED ARTICLES

Recent Posts

Popular Post