Scale AI workloads seamlessly! MultiKueue dynamically allocates GPUs across GKE clusters, ensuring maximum efficiency and performance wherever you deploy.
Large language models (LLMs) and artificial intelligence (AI) are expanding rapidly, enabling everything from artistic creation to machine translation. These technologies depend on complex calculations that need for specialised technology, such as GPUs. However, it might be difficult to get GPUs due to their cost and availability.
The advent of Dynamic Workload Scheduler (DWS) revolutionised the way Google Cloud customers may access and utilise GPU resources, especially in a Google Kubernetes Engine (GKE) cluster. By concurrently scheduling required accelerators, such as TPUs and GPUs, across several Google Cloud services, Dynamic Workload Scheduler optimises AI/ML resource access and expenditure, enhancing training and fine-tuning task performance.
Additionally, Dynamic Workload Scheduler provides a simple and direct interaction between GKE and Kueue, a cloud-native work scheduler, which facilitates the fastest possible access to GPUs for a particular GKE cluster in a certain area.
However, what happens if you want to quickly deploy your workload in any area that is accessible as soon as DWS gives you the resources your workload requires?
MultiKueue, a Kueue function, is useful in this situation. GKE, Dynamic Workload Scheduler, and MultiKueue allow you to wait for accelerators in different locations. As soon as resources become available, Dynamic Workload Scheduler instantly allocates them to the optimal GKE clusters. It helps to optimise global resource utilisation, save expenses, and speed up processing by submitting jobs to a global queue and executing them in the region with available GPU resources.
MultiKueue
Workload distribution among several GKE clusters in various locations is made possible by MultiKueue. It streamlines the process of allocating tasks to the best place by locating clusters with available resources.
Google Cloud’s managed Kubernetes solution, Dynamic Workload Scheduler on GKE Autopilot, is compatible with GKE Autopilot 1.30.3 and manages the provisioning, scalability, security, and upkeep of your container infrastructure automatically. To get GPU resources more quickly, let’s take a closer look at how to configure and maintain MultiKueue with Dynamic Workload Scheduler.
MultiKueue cluster roles
There are two different cluster roles offered those are:
Manager cluster
Create and monitor distant objects (jobs or workloads) while maintaining synchronisation with the local ones, and establish and maintain the connection with the worker clusters.
Worker cluster
A straightforward stand-alone Kueue cluster that enables you to carry out the task that the management cluster submits.
Creating a cluster
Google Cloud establish four GKE Autopilot clusters in this example:
- In Europe-West4, one manager cluster
- Three groups of workers in
- Europe-West4
- US-East4
- Asia-Southeast 1
Let’s examine how this operates in the step-by-step example that follows. The files used in this example are available in this github repository.
Clone github repository
git clone https://github.com/GoogleCloudPlatform/ai-on-gke.git
cd ai-on-gke/tutorials-and-examples/workflow-orchestration/dws-multiclusters-example
Create GKE clusters
terraform -chdir tf init
terraform -chdir tf plan
terraform -chdir tf apply -var project_id=<YOUR PROJECT_ID>
This Terraform script adds four lines to your Kubeconfig files and generates the necessary GKE clusters:
- manager-Europe-West4
- worker-us-east4
- worker-europe-west4
- worker-southeast-asia1
Then, you can quickly transition between situations using.
kubectl config use-context <context name>
Install and configure
./deploy-multikueue.sh
This script:
- Kueue is installed in each of the four clusters.
- It is enabled and configured in the management cluster.
- supports the sending of Kueue metrics to Google Cloud Managed Service for Prometheus by creating a podMonitoring resource for every cluster.
- establishes the link between the worker and management clusters.
- Kueue is configured in the worker clusters.
DWS, Kueue with MultiKueue, and GKE clusters are now set up and operational. The Kueue manager divides your jobs across the three worker clusters when you submit them.
The Kueue setup for the worker clusters, including the manager settings, is included in the dws-multi-worker.yaml file.
A simple example of configuring the MultiKueue Admission Check with three worker clusters is shown in the script that follows.
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
name: sample-dws-multikueue
spec:
controllerName: kueue.x-k8s.io/multikueue
parameters:
apiGroup: kueue.x-k8s.io
kind: MultiKueueConfig
name: multikueue-dws
---
apiVersion: kueue.x-k8s.io/v1alpha1
kind: MultiKueueConfig
metadata:
name: multikueue-dws
spec:
clusters:
- multikueue-dws-worker-asia
- multikueue-dws-worker-us
- multikueue-dws-worker-eu
---
Submit jobs
When submitting jobs, be sure to use the manager kubecontext.
kubectl config use-context manager-europe-west4
kubectl create -f job-multi-dws-autopilot.yaml
You can submit the job creation request much than once to see how the they admission check allocates jobs across worker clusters.
Get jobs status
Use the following command to find the scheduled region and verify the task status.
kubectl get workloads.kueue.x-k8s.io -o jsonpath='{range .items[*]}{.status.admissionChecks}{"\n"}{end}'
Delete resources
Lastly, be sure to remove the four GKE clusters you made in order to test this feature:
terraform -chdir=tf destroy -var project_id=<YOUR_PROJECT_ID>
What’s next
Thus, you may use MultiKueue, GKE, and DWS to optimise performance, expedite global task execution, and do away with manual node management!
By enabling you to allocate subsets of clusters for various workloads and guaranteeing compliance, this configuration also meets the demands of people with data residency restrictions.
You may use sophisticated kueue capabilities, such as workload priority classes or team management with local kueue, to further improve your setup. Making a Grafana or Cloud Monitoring dashboard with Kueue metrics which are managed automatically by Google Managed Service for Prometheus using the PodMonitoring resources is another way to obtain insightful information.