This context provides a detailed guide on implementing PodSecurityPolicy in AWS EKS to enhance Kubernetes security.
Abstract
The context discusses the importance of PodSecurityPolicy in Kubernetes, which is an admission controller resource that enables fine-grained authorization of pod creation and updates. It explains the need for PodSecurityPolicy due to the default permissions that allow all resources and authenticated users to create pods, even privileged ones, which can lead to potential security risks. The guide focuses on AWS EKS and provides steps to create and define policies, enable the PodSecurityPolicy admission controller, and verify the default policy. It also covers what PodSecurityPolicy can and cannot do, and how to create RoleBinding to use the policy. The context concludes with best practices for using PodSecurityPolicy, such as restricting containers that can run as privileged, not running processes in containers as root, avoiding Docker in Docker or mounting the socket in the container, restricting the use of hostPath, and not allowing privileged escalation.
Bullet points
PodSecurityPolicy is an admission controller resource that enables fine-grained authorization of pod creation and updates.
It is a cluster-level resource that controls security-sensitive aspects of the pod specification.
PodSecurityPolicy became available as early as in Kubernetes 1.5/1.6.
In AWS EKS, the pod security policy admission controller is only enabled on Amazon EKS clusters running Kubernetes version 1.13 or later.
PodSecurityPolicy can control running of privileged containers, usage of host namespaces, usage of host networking and ports, usage of volume types, usage of the host filesystem, allow specific FlexVolume drivers, allocating an FSGroup that owns the pod’s volumes, requiring the use of a read-only root file system, the user and group IDs of the container, restricting escalation to root privileges, Linux capabilities, the SELinux context of the container, the Allowed Proc Mount types for the container, the AppArmor profile used by containers, the seccomp profile used by containers, and the sysctl profile used by containers.
PodSecurityPolicy can’t do everything and it only works when creating or updating the pod.
In order to use the policy, the requesting user (or target pod’s service account) must have permission to “use” this policy, by allowing the “use” verb on the policy.
Besides the PodSecurityPolicy, you also need to create a Role/ClusterRole that allows you to use that policy, and a RoleBinding/ClusterRoleBinding to bind to the role.
The context provides examples of creating RoleBinding and testing pod creation with a restricted policy.
The best practices for using PodSecurityPolicy include restricting the containers that can run as privileged, not running processes in containers as root, avoiding Docker in Docker or mounting the socket in the container, restricting the use of hostPath, and not allowing privileged escalation.
Landshut, Lower Bavaria, Germany
A Detailed Guide to Kubernetes PodSecurityPolicy in AWS EKS
Pod Security
Pods have a variety of different settings that can strengthen or weaken your overall security posture. As a Kubernetes practitioner your chief concern should be preventing a process that’s running in a container from escaping the isolation boundaries of Docker and gaining access to the underlying host.
The processes that run within a container run under the context of the Linux root user by default. Although the actions of the root within a container are partially constrained by the set of Linux capabilities that Docker assigns to the containers, these default privileges could allow an attacker to escalate their privileges and/or gain access to sensitive information bound to the host, including Secrets and ConfigMaps.
If you want to know more about Docker, Linux Capabilities, and why pod security matters, please read my other articles before continuing reading this one, so that you can fully understand what it is and why we need it:
In most clusters today, by default, all resources (e.g. Deployments and ReplicatSets) and authenticated users have permissions to create pods, even privileged ones, running as root, accessing files and paths from the host machine, making the attack surface much bigger.
If an attacker attacked the pod, since the pod is already running as root, if he could escape the boundary that is the container by exploiting some vulnerabilities, he could potentially become root on the host machine.
There are a lot of third-party run-time security tools to strengthen up security inside Kubernetes, but if you don’t want to buy something else, with minimum effort, you can already strengthen up the security by simply applying the PodSecurityPolicy which comes native with Kubernetes.
What is PodSecurityPolicy
A PodSecurityPolicy is an admission controller resource, which enables fine-grained authorization of pod creation and updates.
It is a cluster-level resource that controls security-sensitive aspects of the pod specification and defines a set of conditions, with which a pod must run, in order to be accepted into the system.
When a request to create or update a Pod does not meet the conditions in the PodSecurityPolicy, that request is rejected and an error is returned.
Before Starting
In this article, we are going to focus mainly on AWS EKS. As of today, the latest version of EKS in AWS is 1.16 which already enables the admission controller with a default privileged policy. So, in short, if you have already upgraded to the latest version of EKS, there is nothing you need to do before you can continue.
A little more on this topic:
Under the hood, in order to use PodSecurityPolicy, you must first create and define policies that new and updated Pods must meet, then enable the PodSecurityPolicy admission controller, which validates requests to create and update Pods against the defined policies.
PodSecurityPolicy became available as early as in Kubernetes 1.5/1.6.
In Google Compute Platform, GKE clusters running Kubernetes version 1.8.6 or later already enabled it.
In AWS, The pod security policy admission controller is only enabled on Amazon EKS clusters running Kubernetes version 1.13 or later.
Note that, when multiple PodSecurityPolicies are available, the admission controller uses the first policy that successfully validates. Policies are ordered alphabetically, and the controller prefers non-mutating policies (policies that don’t change the Pod) over mutating policies.
Another note before you continue: PodSecurityPolicies are enforced by enabling the admission controller, but doing so without authorizing any policies will prevent any pods from being created in the cluster.
What PodSecurityPolicy Can Do
With PodSecurityPolicy, you can control the following:
Running of privileged containers
Usage of host namespaces
Usage of host networking and ports
Usage of volume types
Usage of the host filesystem
Allow specific FlexVolume drivers
Allocating an FSGroup that owns the pod’s volumes
Requiring the use of a read-only root file system
The user and group IDs of the container
Restricting escalation to root privileges
Linux capabilities
The SELinux context of the container
The Allowed Proc Mount types for the container
The AppArmor profile used by containers
The seccomp profile used by containers
The sysctl profile used by containers
While this seems to be an overwhelmingly long list, chances are, you might have already used a few of them when you are using Kubernetes, for example:
You need to mount some storage to the pod-like PVC
When you don’t need/want to run a pod with a root user, you use security context to run as a user/group
For some of your applications, you need to mount some volumes to it
For some logging applications, you need to access the logs from a path that lives on the host
You can do this, because either your cluster does not enable the PodSecurityPolicy admission controller, or it is enabled but there is a default policy that allows everything.
In the case of AWS EKS, the clusters with Kubernetes version 1.13 and higher have a default pod security policy named eks.privileged. This policy has no restriction on what kind of pod can be accepted into the system, which is equivalent to running Kubernetes with the PodSecurityPolicy controller disabled (or there is one that allows you to do everything).
What PodSecurityPolicy Can Not Do
However, PodSecurityPolicy can’t do everything.
If it’s not in the list above, it can’t do it.
Also, due to the nature of the admission controller, the policy only works when you are creating or updating the pod. If the pod violates the policy, it won’t be created.
However, if you modify the policy after pods are already up and running, making the pods violating the new policy, the pods won’t be shut down.
PodSecurityPolicy, as the name suggests, is only a set of policies that are enforced when creating/updating pod. It is not a container run-time security platform that can detect violations and shutdown pods. This is important to know, and this explains why you might want to consider tools like aqua, sysdig, Falco when you want more control over Kubernetes run-time security.
Preparing Your Environment
For AWS EKS latest version 1.16 (or from1.13), the admission controller is already enabled, so there is nothing you need to do. To verify the default policy is there, run:
kubectl get psp eks.privileged
Example output:
If you don’t feel like ruining your running cluster, testing from minikube might be a good idea.
Creating PodSecurityPolicy
A default, privileged PodSecurityPolicy looks like this:
This is the one from AWS EKS default PodSecurityPolicy. It’s a Standard Kubernetes resource definition format.
As you can see, most entries correspond to the list above, and you can also see from the values, that, this policy does not actually limit anything. Running your cluster with this policy is identical to running your cluster with the PodSecurityPolicy admission controller disabled.
Another example of a slightly limited PodSecurityPolicy:
In this example above, the only limit is, privileged pods are not allowed, but user id, volumes, etc, are not limited.
Creating RoleBinding — How PodSecurityPolicy Works
When a PodSecurityPolicy resource is created, it actually does nothing.
In order to use the policy, the requesting user (or target pod’s service account) must have permission to “use” this policy, by allowing the use verb on the policy.
So, besides the PodSecurityPolicy, you also need to create a Role/ClusterRole that allows you to use that policy, and a RoleBinding/ClusterRoleBinding to bind to the role.
For example, if you created a super PodSecurityPolicy which allows everything (the first example above), but you want to limit it to a certain namespace, say, kube-system, then you can create a Role in kube-system with permission to use your PodSecurityPolicy, then create a RoleBinding to bind a service account to that Role:
In this example above, a service account “aws-node” from kube-system namespace is bound to a role that can “use” the “privileged” policy. So when the pod is using this service account in the kube-system namespace, it can have any privileges.
For another example, if you created a restricted/unprivileged PodSecurityPolicy that only allows basic permission and you want to make this as a default to all authenticated users, you can create a ClusterRole, giving it access to use this policy, then create a ClusterRoleBinding to bind all authenticated users to this role.
The only thing you need to pay attention is, if you want to by default not allow privileged pods, in the ClusterRoleBinding, you can bind a group “system:authenticated” instead of to a specific service account.
For a fully restricted PodSecurityPolicy and ClusterRole/ClusterRoleBinding, see here.
Testing
Now let’s have a test of pod creation with a restricted policy.
First, delete the default privileged PodSecurityPolicy from AWS EKS:
A very simple image with a hello-world app, but in the Dockerfile, it runs as user ID 0 (root). You can try applying the above pod, and you will get error:
Error: container has runAsNonRoot and image will run as root
For an image that runs as another user ID:
If you apply this one, it would work, because in this image it runs as user ID 1000.
Best Practice
Restrict the containers that can run as privileged
Containers that run as privileged inherit all of the Linux capabilities assigned to root on the host. Containers seldom need these types of privileges to function properly. You can reject pods with containers configured to run as privileged.
However, there are a few pods in EKS that need some type of privilege, be it run as root or capabilities. For example, the coredns need NET_BIND_SERVICE, but it doesn’t need to run as root; while aws-node, kube-proxy requires different access to different paths on the host machine.
So the recommendation here is to create one policy for each pod, bind it to that service account so that each pod has exactly the required minimum set of permissions.
Do not run processes in containers as root
All containers run as root by default. This could be problematic if an attacker is able to exploit a vulnerability in the application and get shell access to the running container.
You can mitigate this risk in a variety of ways:
First, by removing the shell from the container image.
Second, adding the USER directive to your Dockerfile or running the containers in the pod as a non-root user. The Kubernetes podSpec includes a set of fields under spec.securityContext, that allow to let you specify the user and/or group to run your application as. These fields are runAsUser and runAsGroup respectively. You can mandate the use of these fields by creating a pod security policy.
Never run Docker in Docker or mount the socket in the container
While this conveniently lets you build/run images in Docker containers, you’re basically relinquishing complete control of the node to the process running in the container.
hostPath is a volume that mounts a directory from the host directly to the container. Rarely will pods need this type of access, but if they do, you need to be aware of the risks. By default pods that run as root will have write access to the file system exposed by hostPath. This could allow an attacker to modify the kubelet settings, create symbolic links to directories or files not directly exposed by the hostPath, e.g. /etc/shadow, install ssh keys, read secrets mounted to the host, and other malicious things. To mitigate the risks from hostPath, configure the spec.containers.volumeMounts as readOnly .
You should also use a pod security policy to restrict the directories that can be used by hostPath volumes. You can see examples in this repo.
Do not allow privileged escalation
Privileged escalation allows a process to change the security context under which it's running. Sudo is a good example of this as are binaries with the SUID or SGID bit. Privileged escalation is basically a way for users to execute a file with the permissions of another user or group. You can prevent a container from using privileged escalation with PodSecurityPolicy as well.
Summary
In order to make it work easier and out of the box, I create a GitHub repo with necessary policies, role bindings, EKS related pods, and for ingress controllers.
Detailed steps with the YAML files are documented here: https://github.com/IronCore864/ekspsp, and you can also follow the steps in the repo to set up everything you need.