
Kubernetes Cluster Running Out of IP Addresses on AWS EKS
Can you imagine someday your Kubernetes cluster on AWS EKS running into a problem that IP addresses are exhausted? Even though you assigned a CIDR block large enough to host all of Pods, but IP address range of the CIDR block might not be that large as you thought. That is the situation what I met in one of our Kubernetes clusters recently.
After doing some research online, I am not alone with it and this could be considered as a common issue for AWS EKS Kubernetes clusters. Therefore, I would like to share my experience about it, including troubleshooting, some experiments and mitigating solutions.
Environment
The Kubernetes cluster running out of IP addresses was established with CIDR/20 which contains 4096 (2¹²) addresses in theory but it would be reserved some for being used by AWS and Kubernetes.
The number of total running Pods in the cluster is about 1200 including applications, AWS nodes, Kubernetes addons, such as istio, prometheus, grafana, etc. which is not that many for a single cluster, especially comparing to the assigned CIDR/20 block.
Problem
One day, my team (Could Engineering) received several support requests from feature teams that their service deployments failed with similar reason which is strange to us.
The failure messages from Kubernetes events looks like:
Warning FailedCreatePodSandBox 17m kubelet, ip-10-68-207-192.ec2.internal Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "a7c7ce835d262d7a3fd4ab94c66376e0266c03ba2fc39365cb108282f440b01a" network for pod "xxxxx-xxxxxserver-deployment-74f49769c5-nkdpn": networkPlugin cni failed to set up pod "xxxxx-xxxxxserver-deployment-74f49769c5-nkdpn_default" network: add cmd: failed to assign an IP address to containerSo I checked the available IP address of the Subnets in the VPC of this EKS cluster from AWS Console:

Let’s focus on the 3 private subnets which are used by the Kubernetes worker nodes and 2 of them have no available IP addresses to be used to assign to Pods.
According the CIDR on each Subnet (/22), there should have 1024 (2¹⁰) IP addresses for each and 3072 in total which looks like enough for our 1200 Pods.
Investigation
When I did some research online to understand how AWS CNI works on assigning IP addresses to the Pods, I found this kind of issue already discussed many times. Some articles have been already illustrated very well about the theory and mechanism of it, so if you would like to understand the details, here are some useful links:
- Pod networking (CNI)
- Optimize IP addresses usage by pods in your Amazon EKS cluster
- Optimizing EKS networking for scale
If you’re a fan of “Talk is cheap. Show me the code.”, well, here is the code:
Before going into more details, just have a quick introduction about how AWE EKS Networking works.
EKS Networking
Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that you can use to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or nodes.

VPC
In the EKS Kubernetes cluster, the control plane was managed by AWS and all the worker nodes are hosted in a VPC includes several Subnets.
Subnets
Each Subnet is also assigned to a CIDR block so it maintains an IP address pool. When a new ENI gets created, it holds a number of IP addresses from the IP address pool in the corresponding Subnet.
EC2 Instance or Node
When a new Kubernetes node joins to the cluster, it is deployed to one Subnet. The node itself is assigned a private IP address from the Subnet. AWS Pod networking (CNI) is deployed on each node. You could check it by listing all the Pods in the kube-system namespace.
$ kubectl get pod -n kube-systemNAME READY STATUS RESTARTS AGE
aws-node-7s5tz 1/1 Running 0 8d
aws-node-89hbv 1/1 Running 0 8d
aws-node-n2szm 1/1 Running 0 8d
aws-node-ss7zf 1/1 Running 0 8d
aws-node-t2hv4 1/1 Running 0 8d
aws-node-x6l7b 1/1 Running 0 8d
...The AWS CNI is running as a aws-node on each node where a DaemonSet is managing all of the pods.
Pod Networking (CNI)
AWS Container Network Interface (CNI) is responsible for assigning a private IP address to each pod running on the node.
ENI (Elastic Network Interface)
ENI could be considered as a network card for a EC2 instance (or a Kubenetes node). A single EC2 node can have multiple ENIs attached, one primary plus several secondaries.
An ENI itself can hold multiple IP addresses.
When an ENI is created and attached to a EC2 node, it reserves a bunch of IPv4 addresses from the Subnet IP address pool.
EC2 Node Capacity for IP Addresses and Pods
The type of the EC2 node decides:
- How many ENIs in maximum the node can have;
- How many IP address in maximum an ENI can hold on the node.
Here is the full list of this definitions: IP addresses per network interface per instance type.
In my cluster, the EC2 node type is m5.8xlarge so the number should be:
- Max Network Interfaces: 8
- Private IPv4 Addresses per Interface: 30
So ideally an m5.8xlarge node can be assigned 240 (8 x 30) IPv4 addresses in total but it doesn’t mean the node can hold 240 pods. This is because not all the assigned IPv4 address could be used by the pods running on it.
AWS provides a list to tell what the maximum number of pods can be assigned to a node based on the type. For a m5.8xlarge node, it can hold up to 234 pods.
IP Address Allocation
Once understanding the basic concepts of EKS Kubernetes cluster networking, we could dig a bit deeper on the logic of IP address allocation on ENIs and nodes for the pods.
There are three environment variables which controls the number of IP addresses reserved by the ENIs.
WARM_ENI_TARGET
It tells how many full ENIs stand by with available IPs on a node and the default value is 1.
WARM_IP_TARGET
It defines the number of available or unused IP addresses on a node. This variable is better to be used with MINIMUM_IP_TARGET together.
It is not set by default.
MINIMUM_IP_TARGET
It is the minimum number of reserved IP addresses on a node.
When a Pod Scheduled to a Node
- Check if there exists an ENI with unused IP addresses.
- If yes, assign an IP address to the Pod from the ENI.
- If no, create an ENI with reserved IP addresses from Subnet IP address pool and attach it to the node. The number of IPs it reserves depends on WARM_IP_TARGET flag.
- Then, assign an IP address to the Pod from the newly created ENI.
When a Pod Killed/Evicted from a Node
- The Pod IP address is detached from the Pod and marked as unused by the ENI.
- But the IP address is still held by the ENI and does NOT return to the Subnet IP address pool until all IP addresses on the ENI are unused.
The last point is critical that implicates an ENI is possible hold a bunch of unused IP address for a long time if only 1 IP addresses on it is using by a Pod.
Why IP Addresses Exhausted?
Let’s assume an extreme scenario in our EKS Kubernetes cluster and the worker node is m5.8xlarge(128 GiB of Memory, 32 vCPUs, EBS only, 64-bit platform).
- A node can host 100 small Pods and 10 big Pods based on the requests/limits of Pods.
- If a node hosts 100 pods, it provides 100 IP addresses for the pods which needs at least 5 ENIs (100 // 29 + 1, 29 is used because the ENI itself also uses an IPv4 address so only 29 IP addresses available for Pods). So 150 IP addresses are totally held by the node (30 * 5).
- After some times, the 100 small Pods are refreshed out of the node, and in the meantime 10 big Pods are scheduled to the node. Since 10 big Pods already occupied all the resources, no more Pods could be scheduled on this node.
- Let’s still assume the 10 big Pods are hosted on all 5 ENIs and then no ENIs get released. It means the node holds about 150 IP addresses with only 10 Pods, so there are 130+ IP addresses wasted.
- We have about 1200 Pods in total running in the Kubernetes cluster. Let’s say 1000 small Pods and 200 big Pods, and then it needs at least 30 nodes (1000/100 + 200/10) to host all 1200 Pods in the cluster.
- So, in the worst case, 1200 Pods (1000 small Pods + 200 big Pods) hosted on 30 nodes can consume 4500 (150 * 30) IP addresses. It is far more than the number of total IP address in our 3 Subnets (which is about 3000 IP address mentioned above).
Even though it is the worst case where the Pod distribution is extremely unbalanced, it is also possible to running into the situation of exhausting IP addresses in an EKS Kubernetes cluster when the number of services increases gradually.
Possible Solutions
In order to verify the conclusion above, I also deployed a small EKS Kubernetes cluster with smaller instances (m5.2xlarge). When I killed some pods by deleting the Deployment, I saw the IP addresses of the killed Pods still held by the ENI which still has other IP addresses in use.
After running a while with creating and deleting Deployments, here is the IP address distribution status:








