GKE Ray Operator
The field of AI is always changing. Larger and more complicated models are the result of recent advances in generative AI in particular, which forces businesses to efficiently divide work among more machines. Utilizing Google Kubernetes Engine (GKE), Google Cloud’s managed container orchestration service, in conjunction with ray.io, an open-source platform for distributed AI/ML workloads, is one effective strategy. You can now enable declarative APIs to manage Ray clusters on GKE with a single configuration option, making that pattern incredibly simple to implement!
Ray offers a straightforward API for smoothly distributing and parallelizing machine learning activities, while GKE offers an adaptable and scalable infrastructure platform that streamlines resource management and application management. For creating, implementing, and maintaining Ray applications, GKE and Ray work together to provide scalability, fault tolerance, and user-friendliness. Moreover, the integrated Ray Operator on GKE streamlines the initial configuration and directs customers toward optimal procedures for utilizing Ray in a production setting. Its integrated support for cloud logging and cloud monitoring improves the observability of your Ray applications on GKE, and it is designed with day-2 operations in mind.
Getting started
When establishing a new GKE Cluster in the Google Cloud dashboard, make sure to check the “Enable Ray Operator” function. This is located under “AI and Machine Learning” under “Advanced Settings” on a GKE Autopilot Cluster.
The Enable Ray Operator feature checkbox is located under “AI and Machine Learning” in the “Features” menu of a Standard Cluster.
You can set an addons flag in the following ways to utilize the gcloud CLI:
gcloud container clusters create CLUSTER_NAME \
— cluster-version=VERSION \
— addons=RayOperator
GKE hosts and controls the Ray Operator on your behalf after it is enabled. After a cluster is created, your cluster will be prepared to run Ray applications and build other Ray clusters.
Record-keeping and observation
When implementing Ray in a production environment, efficient logging and metrics are crucial. Optional capabilities of the GKE Ray Operator allow for the automated gathering of logs and data, which are then seamlessly stored in Cloud Logging and Cloud Monitoring for convenient access and analysis.
When log collection is enabled, all logs from the Ray cluster Head node and Worker nodes are automatically collected and saved in Cloud Logging. The generated logs are kept safe and easily accessible even in the event of an unintentional or intentional shutdown of the Ray cluster thanks to this functionality, which centralizes log aggregation across all of your Ray clusters.
By using Managed Service for Prometheus, GKE may enable metrics collection and capture all system metrics exported by Ray. System metrics are essential for tracking the effectiveness of your resources and promptly finding problems. This thorough visibility is especially important when working with costly hardware like GPUs. You can easily construct dashboards and set up alerts with Cloud Monitoring, which will keep you updated on the condition of your Ray resources.
TPU assistance
Large machine learning model training and inference are significantly accelerated using Tensor Processing Units (TPUs), which are custom-built hardware accelerators. Ray and TPUs may be easily used with its AI Hypercomputer architecture to scale your high-performance ML applications with ease.
By adding the required TPU environment variables for frameworks like JAX and controlling admission webhooks for TPU Pod scheduling, the GKE Ray Operator simplifies TPU integration. Additionally, autoscaling for Ray clusters with one host or many hosts is supported.
Reduce the delay at startup
When operating AI workloads in production, it is imperative to minimize start-up delay in order to maximize the utilization of expensive hardware accelerators and ensure availability. When used with other GKE functions, the GKE Ray Operator can significantly shorten this startup time.
You can achieve significant speed gains in pulling images for your Ray clusters by hosting your Ray images on Artifact Registry and turning on image streaming. Huge dependencies, which are frequently required for machine learning, can lead to large, cumbersome container images that take a long time to pull. For additional information, see Use Image streaming to pull container images. Image streaming can drastically reduce this image pull time.
Moreover, model weights or container images can be preloaded onto new nodes using GKE secondary boot drives. When paired with picture streaming, this feature can let your Ray apps launch up to 29 times faster, making better use of your hardware accelerators.
Scale Ray is currently being produced
A platform that grows with your workloads and provides a simplified Pythonic experience that your AI developers are accustomed to is necessary to stay up with the quick advances in AI. This potent trifecta of usability, scalability, and dependability is delivered by Ray on GKE. It’s now simpler than ever to get started and put best practices for growing Ray in production into reality with the GKE Ray Operator.