Use the open-source Autoscaler to rightsize your memorystore for Redis clusters.
Autoscaler GCP
The ability to automatically scale resources up and, perhaps more crucially, to scale them back down to control costs and performance is one of the most alluring features of cloud computing. This is common practice with virtual machines, such as Compute Engine Managed Instance Groups, but it is less common with stateful services, such databases, due to their inherent complexity.
Google Cloud introduced Memorystore for Redis Cluster last year, which allows users to manually initiate scale out and down. Google Cloud is pleased to provide the open-source Memorystore Cluster Autoscaler GCP, which is accessible on GitHub and expands upon Google Cloud’s open-source Spanner Autoscaler, which they published in 2020, in order to address the highly elastic nature of contemporary Memorystore workloads.
Understanding cluster scaling
Redis’s memorystore the number of shards in your cluster, which may be changed without causing any downtime, and the size of each shard, which corresponds to the kind of underlying node, define the cluster’s capacity. The cluster’s node type is currently unchangeable. You may change the number of shards in your cluster to scale capacity in or out. You may automate this process by using the Memorystore Cluster Autoscaler to keep an eye on your cluster metrics and adjust the cluster’s size accordingly. Without affecting cluster availability, the Autoscaler GCP uses rulesets that assess memory and CPU utilisation to make the required resource modifications.
The Autoscaler is seen in the accompanying figure, where a Memorystore for Redis Cluster instance automatically scales out as memory usage rises. One GB of data is written to the cluster every five minutes, as shown by the green line. The number of shards in the cluster is shown by the blue line. As you can observe, the cluster expands out as the number of shards increases in proportion to the amount of memory used, plateaus when writes cease, and then scales back in after the test is over and the keys are flushed.

Experience and deployment
Deploy the Autoscaler to a Google Cloud project in order to use it. Because of the Autoscaler’s great flexibility and variety of deployment choices, the repository includes documentation on the different deployment models in addition to several sample Terraform deployment configurations.
After the Autoscaler GCP has been deployed, configure it to meet the scaling needs of the Memorystore instances under management and the specifics of your workloads. Setting Autoscaler configuration options for every Memorystore instance is how you accomplish this. The Autoscaler scales and controls the Memorystore instances on its own after it is setup. Additional information on these settings may be found in the Autoscaler documentation and later in this post.
Autoscaler architecture
The Poller and the Scaler are the two primary parts of the Autoscaler. These may be set up to execute the Autoscaler on a user-specified schedule by deploying them to either Cloud execute functions or Google Kubernetes Engine (GKE) using Terraform. To ascertain utilisation, the Poller sends the Memorystore metrics to the Scaler after querying them at a predetermined period in Cloud Monitoring. The Scaler then decides whether to scale the instance in or out, and if so, by how many shards, by comparing the metrics to the suggested thresholds listed in the rule set. To find the minimum and maximum cluster sizes as well as any other criteria appropriate for your environment, you can change the sample configuration.
During the process, the Autoscaler GCP sends metrics to Cloud Monitoring to shed light on its operations and a detailed account of its suggestions and actions to Cloud Logging for monitoring and auditing.

Scaling rubrics
The most prevalent factors limiting memorystore performance are CPU and in-memory storage. By using the CPU_AND_MEMORY profile, the Autoscaler GCP is set up by default to account for both of these variables when scaling. This is a solid starting point for your deployment, and if necessary, you may change it with a custom configuration to meet your needs.
Defaults
Metric | Average Default Setting | Max Default Setting |
CPU scale OUT | CPU > 70% | Max CPU > 80% and average CPU > 50% |
CPU scale IN | CPU < 50% * | Max CPU < 60% and average CPU < 40% * |
Memory Scale OUT | Usage > 70% | Max Usage > 80% and average usage > 50% |
Memory Scale IN | Usage < 50% * | Max Usage < 60% and average usage < 40% * |
- Ongoing key evictions, which take place when the keyspace fills up and keys are taken out of the cache to provide space, will prevent scale-in. Although scale in is enabled by default, a custom scaling profile may be used to configure it.
Scaling scenarios and methods
Now let’s examine several common situations, their unique use trends, and the Autoscaler GCP setups that work well for each.
Standard workloads
With many Memorystore-backed applications, users tend to use them more during specific hours of the day than others. For example, a banking application may see users check their accounts in the morning, conduct transactions in the afternoon and early evening, and then use it little at night.
This quite common situation is said to be a “standard workload” whose time series demonstrates:
- At some times of the day, there is a significant rise or reduction in utilisation.
- Little spikes above and below the cutoff.

For these kinds of workflows, the following fundamental setup is advised:
- To cover large-scale occurrences, use the linear scaling method.
- To reduce Autoscaler’s reaction time, set the scaleOutCoolingMinutes value to a modest number between 5 and 10 minutes.
Plateau workloads
Applications that are used more often during the day, such chat, gaming, and worldwide apps, are another typical situation. Because user interactions with these apps are more regular, utilisation increases are not as noticeable as they would be for a typical workload.
The time series of the “plateau workload” produced by these circumstances reveals:
- Throughout the day, a pattern made up of different plateaus.
- Within the same plateau, a few bigger spikes.

For these kinds of workflows, the following fundamental setup is advised:
- With a stepSize big enough to cover the biggest utilisation surge with just a few steps on an average day, the STEPWISE scaling method, OR
- The LINEAR scaling method is used when there is a likelihood of a significant spike or dip in usage during specific periods, such as when breaking news is disseminated. To prevent your instance’s capacity from being reduced too rapidly, use this technique in conjunction with a scaleInLimit.
Batch workloads
In order to manage batch operations or a sales event, where the schedule is often known in advance, customers frequently want more capacity for their Memorystore clusters. These situations make up a “batch workload” that has the characteristics listed below:
- A known, programmed peak that calls for more processing power
- A decrease in use once the procedure or event is finished

It is advised that two distinct scheduled jobs be included in the standard setup for these kinds of workloads:
- One for the batch process or event, which comprises a minSize value of the maximum number of shards or nodes to cover the process or event and an object in the configuration that employs the DIRECT scalingMethod
- One for routine operations, which uses the LINEAR or STEPWISE technique and involves configuration with the same projectId and instanceId. When the procedure or event is over, this job will handle reducing the capacity.
In order to prevent conflicts between the two setups, make sure you select a suitable scaling schedule. Make that the batch process begins before the Autoscaler GCP begins scaling the instance back in for both Cloud Run functions and GKE deployments. If necessary, you may slow down the scale-in process by using the scaleInLimit option.
Spiky workloads
Memorystore may take a few minutes to change the cluster topology and make full use of the additional capacity, depending on the demand. Consequently, the Autoscaler GCP may not be able to supply capacity fast enough to prevent delay or effectively enough to result in cost savings if your traffic is characterised by extremely spiky traffic or sudden-onset load patterns.
A baseline setup for these spiking workloads ought to:
- Choose a minSize that marginally exceeds the typical instance workload.
- When the spike is over, use the LINEAR scaling method in conjunction with a scaleInLimit to prevent additional delay.
- Select scaling levels that are responsive to significant spikes but large enough to smooth off some minor ones.
Advanced usage
The Autoscaler GCP comes preset with scaling rules that optimise cluster size depending on CPU and memory utilisation, as previously mentioned. To meet your utilisation, performance, and/or financial objectives, you might need to adjust these guidelines, nevertheless, based on your workload or workloads.
The rule sets used for scaling can be altered in a number of ways, with varying degrees of effort needed:
- Decide whether to scale only on CPU or memory metrics. If you see that your clusters are flapping that is, quickly changing sizes this can be helpful. This may be accomplished by overriding the Autoscaler configuration’s default CPU_AND_MEMORY by providing a scalingProfile of either CPU or MEMORY.
- As seen in this example, you may use your own custom scaling rules by giving the Autoscaler GCP configuration a custom rule set and a scalingProfile of CUSTOM.
- As part of a scaling profile, develop your own unique rule sets and make them accessible to all members of your organisation. This may be accomplished by tailoring one of the pre-existing scaling profiles to your requirements. Google Cloud advise you to begin by examining the current scaling profiles and rules and then making your own modifications.