AWS EKS Best Practices
A checklist for Cloud Engineers to live by
In this guide, we’ll explore the best practices to focus on when working with Amazon Elastic Kubernetes Service (EKS) and how to optimize application workloads, harden security configurations, and simplify cluster operations while making the most out of AWS’s powerful cloud infrastructure.
#1 — Enhance Network Security
✅ Block SSH/RDP remote access to EKS cluster node groups
Disabling SSH/RDP remote access to your EKS cluster node groups largely prevents unauthorized access and potential breaches. It also lowers the risk of bad actors taking over your infrastructure and keeps your EKS cluster resources and sensitive data safe.
To achieve this using AWS CLI, when creating the EKS cluster node group, avoid using --remote-access
option in create-nodegroup
command.
# With --remote-access option:
aws eks create-nodegroup
--region us-east-1
--cluster-name my-cluster
--nodegroup-name my-nodegroup-1
--instance-types m5.large
--subnets subnet-xxxxxx subnet-yyyyy
--remote-access ec2SshKey="my-ssh-key-1",sourceSecurityGroups="sg-XXXXX"
--node-role arn:aws:iam::XYXYXYXY:role/my-eks-node-role
# After removing --remote-access option:
aws eks create-nodegroup
--region us-east-1
--cluster-name my-cluster
--nodegroup-name my-nodegroup-1
--instance-types m5.large
--subnets subnet-xxxxxx subnet-yyyyy
--node-role arn:aws:iam::XYXYXYXY:role/my-eks-node-role
However, if you really need remote access, enable it on a case-by-case basis while taking extra precautions like using strong authentication, ensuring secure network connections through security groups, and regularly checking access logs for any suspicious activity.
✅ Block Public Access to EKS Cluster Endpoint
When launching a new EKS cluster, a public endpoint is automatically generated on the Kubernetes API server, so that the Kubernetes management tools (e.g. kubectl
) can communicate with your EKS cluster. Since this API server endpoint is publicly accessible from the internet, this configuration exposes your EKS cluster to various malicious activities and attacks.
As a best practice, this public access to EKS cluster endpoints must be revoked by using endpointPublicAccess=false
option with update-cluster-config
command. However, you can still set endpointPrivateAccess=true
in order to maintain private access to the EKS cluster (e.g. kubectl
commands running from an EC2 bastion host within the VPC), especially for carrying out cluster management operations.
# Disable public access to EKS cluster and enable only private access
aws eks update-cluster-config
--region us-east-1
--name my-cluster
--resources-vpc-config
endpointPublicAccess=false,endpointPrivateAccess=true,publicAccessCidrs=["10.0.0.20/32"]
For advanced configurations, read more about Amazon EKS cluster endpoint access control.
✅ Restrict unnecessary ingress traffic using EKS Security Groups
Avoid opening all ports within EKS security groups, as it can expose vulnerabilities to attackers who may use port scanners and probing techniques to identify applications and services and launch malicious activities like brute-force attacks. In most instances, permitting inbound traffic solely on TCP port 443 (HTTPS) would be sufficient.
The describe-security-groups
command can be used to check inbound/ingress rules associated with the security group and to revoke any unnecessary ingress rules, revoke-security-group-ingress
command can be used as follows. If TCP port 443 (HTTPS) is not open, authorize-security-group-ingress
command can be used to add the missing ingress rule to the security group.
# Check inbound/ingress rules
aws ec2 describe-security-groups
--region us-east-1
--group-ids sg-xxxxx
--query 'SecurityGroups[*].IpPermissions'
# Revoke non-compliant ingress rules (e.g. revoke SSH traffic on TCP port 22)
aws ec2 revoke-security-group-ingress
--region us-east-1
--group-id sg-xxxxx
--protocol tcp
--port 22
--cidr 0.0.0.0/0
# Allow incoming traffic on TCP port 443
aws ec2 authorize-security-group-ingress
--region us-east-1
--group-id sg-xxxxx
--protocol tcp
--port 443
--cidr 10.10.1.0/24
✅ Harden IAM Role Policies of EKS Cluster Node Groups
An IAM role is assigned to every worker node in the EKS cluster node group in order to run kubelet
and interact with various other APIs. This IAM role eliminates the need for individual credentials on each node and simplifies providing fine-grained permissions. Also, ensure that these IAM roles must only have the necessary permissions for the tasks they perform, following the principle of least privilege.
The following commands can be used to remove a non-compliant IAM role policy and attach a new one.
# Remove policy
aws iam delete-role-policy
--role-name my-node-group-role
--policy-name my-old-policy
# Attach policy
aws iam attach-role-policy
--role-name my-node-group-role
--policy-name my-new-policy
✅ Restrict Kubernetes RBAC
Limit permissions in not only IAM but also Kubernetes RBAC, reducing the attack surface and adhering to the “principle of least privilege” — especially, minimising permissions granted via the aws-auth
ConfigMap and Kubernetes roles
and clusterroles
to decrease the risk of compromised credentials.
✅ Authenticate Kubernetes API calls by integrating with an OpenID Connect identity provider
OpenID Connect (OIDC) provides a secure and flexible way to authenticate and authorize users within applications and systems. OIDC providers can be used as an alternative to IAM and after configuring authentication to EKS cluster, you can create Kubernetes roles
and clusterroles
to assign permissions to the roles, and then bind the roles to the identities using Kubernetes rolebindings
and clusterrolebindings
. Note that you can only associate one OIDC identity provider to your cluster. For instructions, read more about authenticating users for your cluster from an OpenID Connect identity provider.
✅ Use EKS CNI policy (AWS-managed) to access networking resources
Attach the AmazonEKS_CNI_Policy
AWS-managed policy for EKS cluster node groups to effectively manage networking resources. This policy allows the Kubernetes CNI (Container Network Interface) to perform essential tasks such as listing, describing, and modifying VPC ENIs (Elastic Network Interfaces) using the VPC CNI Plugin ( amazon-vpc-cni-
k8s) on behalf of the cluster, ensuring proper networking functionality and communication within the EKS environment. For additional instructions, read more about configuring the Amazon VPC CNI plugin for Kubernetes.
# Attach policy
aws iam attach-role-policy
--role-name AmazonEKSVPCCNIRole
--policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
✅ Use ECR read-only policy (AWS-managed) to access ECR repositories
Attach the AmazonEC2ContainerRegistryReadOnly
AWS-managed policy for EKS cluster node groups to grant permissions to only read and retrieve container images from ECR repositories, without allowing any unnecessary operations on ECR.
# Attach policy
aws iam attach-role-policy
--role-name AmazonEKSECRReadRole
--policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
✅ Use EKS Cluster policy (AWS-managed) to manage AWS resources
Attach the AmazonEKSClusterPolicy
AWS-managed policy for EKS cluster role to provide Kubernetes with the permissions it requires to manage resources on your behalf. It ensures secure access control and cluster operations, seamless integration with AWS services, and regular updates from AWS.
# Attach policy
aws iam attach-role-policy
--role-name AWSEKSClusterRole
--policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
✅ Enable Envelope Encryption for EKS Kubernetes Secrets using KMS
By default, all Kubernetes secrets are stored on the Kubernetes backend database — etcd
, in plain text. Anyone having access to the Kubernetes master will be able to see the secrets by looking it up in the backend. This is a huge vulnerability and to add an extra layer of security, implement envelope encryption (i.e. encrypt a key with another key) for these Kubernetes secrets using KMS keys. This will encrypt plaintext Kubernetes secrets with Data Encryption Key (DEK) and encrypt the DEK with kms:encrypt
before storing in etcd
. KMS can support Customer-managed keys (CMKs), AWS-managed keys, or AWS-owned keys for encryption — and in general, CMKs are the most recommended option.
To implement this strategy, read more about using envelope encryption with AWS KMS keys and using EKS encryption provider support for defense-in-depth.
#2 — Enable logging & monitoring
✅ Setup EKS control plane logging
Ensure control plane logs are activated for all EKS clusters, which enables publishing API, audit, controller manager, scheduler, and authenticator logs to AWS CloudWatch Logs. With this setup, various log types, including API server logs, audit logs, authenticator logs (specific to AWS EKS), controller manager logs, and scheduler logs can be collected. Also, note that each of these log types corresponds to a crucial component within the Kubernetes control plane. For instructions, read more about enabling and disabling control plane logs.
# Enable AWS EKS control plane logging
aws eks update-cluster-config
--region us-east-1
--name my-cluster
--logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'
✅ Setup EKS Audit Log Monitoring in GuardDuty
Auditing activities on EKS clusters for suspicious changes using a tool like GuardDuty is an important security measure. GuardDuty supports security monitoring features, including monitoring Kubernetes audit logs from EKS clusters and analysing them for potentially malicious and suspicious activity. It consumes Kubernetes audit log events directly from the Amazon EKS control plane logging feature and captures chronological activities from users, applications using the Kubernetes API, and the control plane.
# Enable EKS Audit Log Monitoring
aws guardduty update-detector
--detector-id xxxxxxxxxxx
--features '[{"Name" : "EKS_AUDIT_LOGS", "Status" : "ENABLED"}]'
Additionally, you can also consider external EKS monitoring tools like TrendMicro Cloud Conformity’s Real-Time Threat Monitoring and Analysis (RTMA) engine, which actively identifies Amazon EKS configuration adjustments within your AWS account and ensures timely audits and detection of changes at the AWS EKS service level.
✅ Setup CloudTrail logging for Kubernetes API calls
Ensure that CloudTrail logging is activated for all EKS clusters to capture and document all Kubernetes API calls. It will record all important cluster operations (e.g. CreateCluster
, DeleteCluster
) and generate detailed log entries for each event, including information about the IAM identities responsible for such actions and the credentials used. For exact steps and instructions, read more about Logging Amazon EKS API calls with AWS CloudTrail.
#3 — Maintain a healthy EKS cluster
✅ Enable readiness and liveness probes for all pods
Readiness probes determine if a pod is ready to serve traffic. When a pod is not ready, it’s removed from service, but it remains running. Readiness probes are crucial for avoiding sending traffic to pods that are still initializing or experiencing issues.
Liveness probes verify if a pod is alive and functioning correctly. If a liveness probe fails, Kubernetes restarts the pod. Liveness probes are essential for detecting and recovering from situations where a pod becomes unresponsive or enters a faulty state while running.
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-deployment
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web-container
image: nginx:latest
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 10
periodSeconds: 15
Implementing these readiness and liveness probes in Kubernetes is crucial for maintaining application health and ensuring high availability. By defining these probes, Kubernetes can automatically check the responsiveness of pods and take corrective actions when necessary.
✅ Enable pod anti-affinity to ensure spreading pod replicas across multiple worker nodes
Deploying pod workloads with multiple replicas spread across multiple worker nodes is crucial for ensuring high availability and fault tolerance in Kubernetes clusters. By utilizing the Kubernetes Anti-Affinity feature, pods are automatically scheduled across different worker nodes, minimizing the risk of a single node failure affecting all application pods.
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-deployment
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web
topologyKey: "kubernetes.io/hostname"
containers:
- name: web-container
image: nginx:latest
In the above example, the podAntiAffinity
field is used to specify that pods with the label app: web
should be spread across different worker nodes (topologyKey: "kubernetes.io/hostname"
). By deploying multiple replicas across multiple nodes, Kubernetes ensures resilience to node failures and enhances the overall availability and reliability of the application.
✅ Enable CPU & Memory resource requests and limits for pods
Applying appropriate resource requests and limits to every pod is vital for optimizing resource utilization and maintaining cluster stability in AWS EKS. Without proper allocation, resource waste can accumulate over time, leading to inefficiencies and performance bottlenecks. Utilizing Kubernetes’ Vertical Pod Autoscaling (VPA) can help automate this process, adjusting resource requests based on historical usage data. While VPA may require pod eviction for changes, upcoming Kubernetes updates aim to address this limitation. Complementing Kubernetes autoscaling with machine learning technology for fine-grained analysis of real-time capacity utilization ensures efficient resource management, enhancing the overall performance and scalability of your EKS clusters.
apiVersion: v1
kind: Pod
metadata:
name: web-pod
spec:
containers:
- name: web-container
image: nginx:latest
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
✅ Deploy worker nodes across multiple Availability Zones
Configuring worker nodes to deploy across multiple Availability Zones is critical for enhancing the resilience and availability of AWS EKS clusters. By spreading worker nodes across zones, the impact of a single zone outage is mitigated, preventing complete cluster downtime. This is achieved by configuring AWS Auto Scaling Groups (ASGs) to span multiple Availability Zones.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: multi-asgs
region: us-west-2
nodeGroups:
- name: ng1
instanceType: m5.xlarge
availabilityZones:
- us-west-2a
- name: ng2
instanceType: m5.xlarge
availabilityZones:
- us-west-2b
- name: ng3
instanceType: m5.xlarge
availabilityZones:
- us-west-2c
✅ Keep the Kubernetes version of the EKS cluster up-to-date
Ensure all EKS clusters run on the latest stable version of Kubernetes. This approach provides access to the latest features, design updates, bug fixes, enhanced security, and improved performance. Ideally, these version checks must happen regularly (e.g. quarterly — since Kubernetes releases new minor versions every ~3 months). For Kubernetes versions compatible with EKS, read more about Amazon EKS Kubernetes versions.
# Check cluster version
aws eks describe-cluster
--region us-east-1
--name my-cluster
--query 'cluster.version'
# Update cluster version
aws eks update-cluster-version
--region us-east-1
--name my-cluster
--kubernetes-version 1.24
Failing to update Kubernetes versions on time can lead to higher extended support costs as well. For instance, to provide extended support to older Kubernetes versions, starting April 1, 2024, you will be charged a total of $0.60 per cluster per hour, not the usual $0.10 (400$+ per month). This is an unnecessary cost and regularly updating the Kubernetes versions on schedule is the way to go.
✅ Match the CoreDNS add-on version with the EKS cluster’s Kubernetes version
When launching a new EKS cluster, for high availability purposes, 2 CoreDNS replicas are deployed by default (regardless of node count). Since these CoreDNS pods serve as the cluster DNS which provides name resolution for all pods in the cluster, its version has to be always up-to-date and compatible with the Kubernetes version of the cluster.
The CoreDNS version can be checked and updated to suitable values using describe-addon
and update-addon
commands.
# Check CoreDNS add-on version
aws eks describe-addon --cluster-name my-cluster --addon-name coredns
# Update CoreDNS add-on version
aws eks update-addon
--region us-east-1
--cluster-name my-cluster
--addon-name coredns
--addon-version v1.11.1-eksbuild.6
--resolve-conflicts PRESERVE
To find compatible version pairs, read more about working with the CoreDNS Amazon EKS add-on.
Conclusion
By following these guidelines, you ensure your EKS environment is secure, highly available, and optimized for performance. Over the past years, the AWS team has been super innovative and has released a plethora of new features on the EKS ecosystem. Embrace these practices to unlock the full potential of AWS EKS and drive success in your cloud-native journey.
Stay tuned for the next AWS tip. Until then, happy coding!