Thursday, December 19, 2024

GKE IP_SPACE_EXHAUSTED error: Find Cause & Fix Solution

- Advertisement -

GKE IP_SPACE_EXHAUSTED

You’ve probably run across the debilitating “IP_SPACE_EXHAUSTED” problem if you use Google Kubernetes Engine (GKE) in your Google Cloud setup.

It is a typical situation: you are certain that your subnet architecture is future-proof and your IP address planning is perfect, and then all of a sudden, your GKE cluster encounters an unforeseen scaling obstacle. You begin to doubt your ability to subnet. How is it possible for a /24 subnet with 252 nodes to be used up with just 64 nodes in your cluster? The explanation is found in the subtle method that GKE distributes IP addresses, which frequently goes above the number of nodes.

- Advertisement -

Actually, node capacity in GKE is influenced by three main aspects. By learning about them, you may significantly reduce your risk of encountering the infamous IP_SPACE_EXHAUSTED error.

Cluster primary subnet: Gives your cluster’s internal load balancers and nodes IP addresses. The maximum scalability of your cluster is theoretically determined by the size of the subnet, but there is more to it than that.

Pod IPv4 range: The pods in your cluster are assigned IP addresses by this alias subnet, which is part of the primary subnet.

Maximum pods per node: This indicates the most pods that GKE is able to schedule on one node. It can be overridden at the node-pool level even though it is set at the cluster level.

- Advertisement -

GKE’s approach to IP allocation

GKE cleverly saves IP addresses for pods. After examining the “Maximum pods per node” choice, it allocates to each node the lowest subnet that can accommodate twice as many IP addresses (maximum pods per node). Kubernetes minimizes IP address reuse when pods are added to and deleted from a node by providing more than twice as many accessible IP addresses as the maximum number of pods that may be produced on a node. GKE determines the smallest subnet mask that may support 220 (2×110) IP addresses (i.e., /24), if the maximum is set to the default 110 for GKE Standard clusters. After that, it divides the pod IPv4 range into /24 slices and distributes them across the nodes.

The “aha!” moment

The main lesson is that the number of /24 slices your pod’s IPv4 range can offer, not just the number of IP addresses in your principal subnet, determines how scalable your cluster can be. Even if your principal subnet has plenty of addresses left, you will see the “IP_SPACE_EXHAUSTED” issue once each node has consumed a slice.

An example to illustrate

Let’s say you set up a GKE cluster with these parameters:

  • Cluster Primary Subnet: 10.128.0.0/22
  • Pod IPv4 Range: 10.0.0.0/18
  • Maximum pods per node: 110

You boldly declared that you could grow your cluster to 1020 nodes. However, the “IP_SPACE_EXHAUSTED” warning occurred when it reached 64 nodes. Why?

The pod IPv4 range is the source of the issue. GKE reserves a /24 subnet for every node, which can have up to 110 pods (2 x 110 = 220 IPs, necessitating a /24). Only 64 /24 subnets can be created from a /18 subnet. Despite having plenty of capacity in your principal subnet, you ran out of pod IP addresses at 64 nodes.

To determine how many nodes can fit into your pod’s IPv4 range, there are two methods:

Subnet bit difference: Determine how much the subnet masks differ. The amount of “subnet bits” and 2⁶ = 64 potential slices, or nodes, that can fit within the pod IPv4 range are obtained by subtracting the pod subnet mask from the subnet mask of the node (24–18 = 6).

Total pod capacity: Taking into account the pod IPv4 range’s total capacity is an additional method of determining the maximum number of nodes. A subnet with a size of /18 can support 2(32-18) = 16,384 IP addresses. You may get the maximum number of nodes by dividing the total pod capacity by the number of addresses per node, which is 16,384 / 256 = 64. This is because each node, with its /24 subnet, needs 256 addresses.

Finding the problem

Network Analyzer tool

The Network Analyzer is a useful Google Cloud service. Among its many capabilities is the ability to identify IP exhaustion problems and provide a summary of your pod IP subnet capacity and the point at which you are about to reach that limit. You will discover the pertinent Network Analyzer insights in the host project if your cluster arrangement consists of a service project for the cluster and a different host project for the VPC network.

A node pool with a /23 pod subnet and a maximum of 110 pods per node (effectively employing a /24 subnet per node) is depicted in the Network Analyzer insight for a GKE cluster. A medium priority warning is displayed when you try to scale this node pool to two nodes, indicating that you have exceeded the maximum number of nodes that the designated pod IP range can support.

Resolving the problem

You can easily extend the cluster’s principal subnet to add extra nodes if you’ve reached its maximum. You have limited choices, though, if the bottleneck is located within the pod IPv4 range:

  • Make a new cluster with a wider range of pod addresses: This is my least favorite approach and it’s easier said than done, but sometimes it’s essential.
  • Adding pod IPv4 address ranges: Adding a second pod IPv4 subnet will solve this problem. Consider it as bringing a new cake to the gathering; additional slices allow you to serve more people, or in this case, more nodes. The combined capacity of the old and new pod IPv4 ranges is then the cluster’s total node capacity.
  • Maximum pods per node: At the cluster and node-pool levels, this option cannot be changed. Nonetheless, you can improve IP address consumption by creating a separate node pool with different maximum pods per node number.

GKE Autopilot clusters

If autopilot clusters are not properly planned, they are susceptible to pod IP address exhaustion. To give extra addresses, you can add more pod IPv4 subnets, same like with Standard clusters. These extra ranges are subsequently used by GKE for pods on nodes built in subsequent node pools.

Determining how to start the creation of a new node pool to utilize those additional pod IP ranges is the less obvious challenge with Autopilot clusters. New node pools cannot be directly created in Autopilot mode. By employing workload separation to deploy a task, you can compel GKE to establish a new node pool. The additional pod IPv4 range will subsequently be accessed by this new pool.

Multiple node pools and varying maximum pods per node

The last jigsaw piece deals with situations in which several node pools have varying “maximum pods per node” values but share the same pod IPv4 range. It is somewhat more difficult to determine the maximum number of nodes in such a configuration.

The number of nodes in each node pool determines the calculation when several node pools share the same pod range. Let’s explain this with an example.

An example to illustrate

The following characteristics of your Standard GKE cluster are present:

  • Cluster Primary Subnet: 10.128.0.0/22
  • Pod IPv4 Range: 10.0.0.0/23
  • Maximum pods per node: 110

The settings of the default node pool are as follows:

  • Name: default-pool
  • Pod IPv4 Range: 10.0.0.0/23
  • Maximum pods per node: 110

Next, you add pool-1, a second node pool with a smaller maximum number of pods per node:

  • Name: pool-1
  • Pod IPv4 Range: 10.0.0.0/23
  • Maximum pods per node: 60

GKE will reserve a /24 subnet per node in the default-pool and a /25 subnet per node in pool-1 based on our understanding. Given that they share the /23 pod IPv4 range, the following combinations are possible:

Maximum of two nodes in default-pool and zero in pool-1 (Total = 2)

Maximum of one node in default-pool and two in pool-1 (Total = 3)

Maximum of zero nodes in default-pool and four in pool-1 (Total = 4)

As you can see, it’s more difficult to figure out how many nodes this cluster can have than it is when node pools have different pod ranges.

Conclusion

Avoiding the annoying “IP_SPACE_EXHAUSTED” warning requires an understanding of the subtleties of GKE’s IP allocation. With the maximum number of pods per node and future scaling in mind, carefully plan your subnets and pod ranges. Your GKE clusters can have the IP address space they require to expand and prosper if you plan ahead. To learn how to use the class E IPv4 address space to lessen IPv4 exhaustion problems in GKE, make sure to read this blog post as well.

- Advertisement -
Thota nithya
Thota nithya
Thota Nithya has been writing Cloud Computing articles for govindhtech from APR 2023. She was a science graduate. She was an enthusiast of cloud computing.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes