GKE Stateful HA Controller: For any operational application running on Google Kubernetes Engine (GKE), designing for application needs is a crucial business consideration. The case for stateful applications, such as databases and message queues, is particularly strong. However, running stateful applications frequently requires a trade-off between availability and cost. For instance, you can save expenses but trade availability by running a single replica application in a single zone. You can also run additional application replicas to provide data redundancy in the case of a node failure if you require higher availability, but this comes at a cost in terms of computation and network infrastructure.
Additionally, Kubernetes’ scheduler adopts an eventually consistent strategy when faced with disruptive events (such as upgrades or zone failure). Despite the fact that this is effective for stateless applications, clients prefer stateful applications to take a more proactive stance. They want to be able to manage failover times and see how stateful apps respond to disruptions.
Customers demand a compromise that combines the cost-effectiveness of a single replica application with the availability of numerous replicas. Stateful HA Operator on GKE, a new feature that delivers proactive scheduling to stateful applications while balancing cost and availability, is what Google are announcing today to help. Operator for Stateful HA is in preview.
Let’s examine Stateful HA Operator in more detail to see how it might assist you in balancing cost and availability for your stateful apps.
Understanding the Stateful HA Controller
High level: By integrating with regional persistent disk (RePD), Stateful HA Operator gives proactive controls to stateful applications and boosts availability.
Numerous low-cost availability tools provide eventually consistent failover, with a failover procedure that lasts about ten minutes. If you wish to reach the industry benchmark Recovery Time Objective (RTO) of 60–120 seconds, this is too long. Stateful HA Operator minimizes this delay and enables workload-specific customization of your failover response, allowing you to match failover times to business needs. You can audit and follow any failover action because you have full observability.
Meanwhile, the usage of regional persistent SSDs opens up a new choice in the cost vs. availability debate. A storage choice known as regional persistent disk enables synchronous data replication between two zones in a region. You can balance cost and availability since running more computation is typically more expensive than adding more storage, and because RePD does not charge for cross-zone networking. Your application has available compute capacity to fail over to in the event of a total failure when the Stateful HA Operator is used with Spare Capacity or PriorityClass.
A case study
Use case 1: Upgrading PostgreSQL availability for a single replica only at an 8% cost increase
Consider a typical PostgreSQL application with a single replica that costs $391 per month on the list price. Applications with a single replica are susceptible to outages, and the RTO might range from hours to days.
You can add tolerance to disruptions like zonal failures for a negligible increase of 8% by deploying the Stateful HA Controller and allowing it to execute failover using Regional PD. At a very appealing price point, Stateful HA Operator redeploys your replica within a specified timeout, enabling you to reduce the application’s unavailability window to match its SLA. Add failover compute capacity if you require even greater availability.
Use case 2: : Lowering costs for a multi-replica Kafka with up to 53% in cost savings
Although inter-zone application replication adds extra computation and storage costs, some applications need to maintain a high RTO in the event of node failures. All replicas can be rescheduled from the primary to the backup zone in the unlikely case of a zone failure. Under typical operating conditions, this enables the application to maximize cost savings on inter-zone networking.
Kafka was created to operate over a flat network. Some applications observe that inter-zone data replication can approach over 80% of overall application expenses, greatly outweighing the cost of both computation and storage, depending on data replication prices. The zonal isolation theory can be used to Kafka. Consider a Kafka application with six replicas. The total list price for all Kafka brokers, distributed equally across zones, is $3,969.54 per month. You can cut the expenditures by up to 53% by deploying the Stateful HA Controller.
Test out the Helm-based GKE Stateful HA Controller
The Stateful HA Operator is a fully automated solution that eliminates the laborious process of tailoring your application to satisfy its availability requirements.
[…] cloud-native microservices-based apps and containerizes existing apps, speeding up app development. Kubernetes automates and monitors many applications for effective management. The declarative, API-driven […]