A Detailed Guide to Kubernetes PodSecurityPolicy in AWS EKS

Pod Security

Pods have a variety of different settings that can strengthen or weaken your overall security posture. As a Kubernetes practitioner your chief concern should be preventing a process that’s running in a container from escaping the isolation boundaries of Docker and gaining access to the underlying host.

The processes that run within a container run under the context of the Linux root user by default. Although the actions of the root within a container are partially constrained by the set of Linux capabilities that Docker assigns to the containers, these default privileges could allow an attacker to escalate their privileges and/or gain access to sensitive information bound to the host, including Secrets and ConfigMaps.

If you want to know more about Docker, Linux Capabilities, and why pod security matters, please read my other articles before continuing reading this one, so that you can fully understand what it is and why we need it:

Docker under the Hood — 1. Diving into the Image

We start from the image — on which the created container is based.

medium.com

Docker under the Hood — 2. Container from Scratch, and Image Storage

In this post we will create a container from scratch without any docker related stuff, and we will take a look at how…

medium.com

Docker under the Hood — User space, Kernel, Syscalls, Permissions, setuid, setgid, and Capabilities

Many Linux security concept explained, and why they are related to docker.

medium.com

Why We Need It

In most clusters today, by default, all resources (e.g. Deployments and ReplicatSets) and authenticated users have permissions to create pods, even privileged ones, running as root, accessing files and paths from the host machine, making the attack surface much bigger.

If an attacker attacked the pod, since the pod is already running as root, if he could escape the boundary that is the container by exploiting some vulnerabilities, he could potentially become root on the host machine.

There are a lot of third-party run-time security tools to strengthen up security inside Kubernetes, but if you don’t want to buy something else, with minimum effort, you can already strengthen up the security by simply applying the PodSecurityPolicy which comes native with Kubernetes.

What is PodSecurityPolicy

A PodSecurityPolicy is an admission controller resource, which enables fine-grained authorization of pod creation and updates.

It is a cluster-level resource that controls security-sensitive aspects of the pod specification and defines a set of conditions, with which a pod must run, in order to be accepted into the system.

When a request to create or update a Pod does not meet the conditions in the PodSecurityPolicy, that request is rejected and an error is returned.

Before Starting

In this article, we are going to focus mainly on AWS EKS. As of today, the latest version of EKS in AWS is 1.16 which already enables the admission controller with a default privileged policy. So, in short, if you have already upgraded to the latest version of EKS, there is nothing you need to do before you can continue.

A little more on this topic:

Under the hood, in order to use PodSecurityPolicy, you must first create and define policies that new and updated Pods must meet, then enable the PodSecurityPolicy admission controller, which validates requests to create and update Pods against the defined policies.

PodSecurityPolicy became available as early as in Kubernetes 1.5/1.6.

In Google Compute Platform, GKE clusters running Kubernetes version 1.8.6 or later already enabled it.

In AWS, The pod security policy admission controller is only enabled on Amazon EKS clusters running Kubernetes version 1.13 or later.

Note that, when multiple PodSecurityPolicies are available, the admission controller uses the first policy that successfully validates. Policies are ordered alphabetically, and the controller prefers non-mutating policies (policies that don’t change the Pod) over mutating policies.

Another note before you continue: PodSecurityPolicies are enforced by enabling the admission controller, but doing so without authorizing any policies will prevent any pods from being created in the cluster.

What PodSecurityPolicy Can Do

With PodSecurityPolicy, you can control the following:

Running of privileged containers
Usage of host namespaces
Usage of host networking and ports
Usage of volume types
Usage of the host filesystem
Allow specific FlexVolume drivers
Allocating an FSGroup that owns the pod’s volumes
Requiring the use of a read-only root file system
The user and group IDs of the container
Restricting escalation to root privileges
Linux capabilities
The SELinux context of the container
The Allowed Proc Mount types for the container
The AppArmor profile used by containers
The seccomp profile used by containers
The sysctl profile used by containers

While this seems to be an overwhelmingly long list, chances are, you might have already used a few of them when you are using Kubernetes, for example:

You need to mount some storage to the pod-like PVC
When you don’t need/want to run a pod with a root user, you use security context to run as a user/group
For some of your applications, you need to mount some volumes to it
For some logging applications, you need to access the logs from a path that lives on the host

You can do this, because either your cluster does not enable the PodSecurityPolicy admission controller, or it is enabled but there is a default policy that allows everything.

In the case of AWS EKS, the clusters with Kubernetes version 1.13 and higher have a default pod security policy named eks.privileged. This policy has no restriction on what kind of pod can be accepted into the system, which is equivalent to running Kubernetes with the PodSecurityPolicy controller disabled (or there is one that allows you to do everything).

What PodSecurityPolicy Can Not Do

However, PodSecurityPolicy can’t do everything.

If it’s not in the list above, it can’t do it.

Also, due to the nature of the admission controller, the policy only works when you are creating or updating the pod. If the pod violates the policy, it won’t be created.

However, if you modify the policy after pods are already up and running, making the pods violating the new policy, the pods won’t be shut down.

PodSecurityPolicy, as the name suggests, is only a set of policies that are enforced when creating/updating pod. It is not a container run-time security platform that can detect violations and shutdown pods. This is important to know, and this explains why you might want to consider tools like aqua, sysdig, Falco when you want more control over Kubernetes run-time security.

Preparing Your Environment

For AWS EKS latest version 1.16 (or from1.13), the admission controller is already enabled, so there is nothing you need to do. To verify the default policy is there, run:

kubectl get psp eks.privileged

Example output:

If you don’t feel like ruining your running cluster, testing from minikube might be a good idea.

Creating PodSecurityPolicy

A default, privileged PodSecurityPolicy looks like this:

This is the one from AWS EKS default PodSecurityPolicy. It’s a Standard Kubernetes resource definition format.

As you can see, most entries correspond to the list above, and you can also see from the values, that, this policy does not actually limit anything. Running your cluster with this policy is identical to running your cluster with the PodSecurityPolicy admission controller disabled.

Another example of a slightly limited PodSecurityPolicy:

In this example above, the only limit is, privileged pods are not allowed, but user id, volumes, etc, are not limited.

Creating RoleBinding — How PodSecurityPolicy Works

When a PodSecurityPolicy resource is created, it actually does nothing.

In order to use the policy, the requesting user (or target pod’s service account) must have permission to “use” this policy, by allowing the use verb on the policy.

So, besides the PodSecurityPolicy, you also need to create a Role/ClusterRole that allows you to use that policy, and a RoleBinding/ClusterRoleBinding to bind to the role.

For example, if you created a super PodSecurityPolicy which allows everything (the first example above), but you want to limit it to a certain namespace, say, kube-system, then you can create a Role in kube-system with permission to use your PodSecurityPolicy, then create a RoleBinding to bind a service account to that Role:

In this example above, a service account “aws-node” from kube-system namespace is bound to a role that can “use” the “privileged” policy. So when the pod is using this service account in the kube-system namespace, it can have any privileges.

For another example, if you created a restricted/unprivileged PodSecurityPolicy that only allows basic permission and you want to make this as a default to all authenticated users, you can create a ClusterRole, giving it access to use this policy, then create a ClusterRoleBinding to bind all authenticated users to this role.

The only thing you need to pay attention is, if you want to by default not allow privileged pods, in the ClusterRoleBinding, you can bind a group “system:authenticated” instead of to a specific service account.

For a fully restricted PodSecurityPolicy and ClusterRole/ClusterRoleBinding, see here.

Testing

Now let’s have a test of pod creation with a restricted policy.

First, delete the default privileged PodSecurityPolicy from AWS EKS:

kubectl delete psp eks.privileged

Then create the restricted policy:

git clone https://github.com/IronCore864/ekspsp.git
kubectl apply -f restricted.yaml

Now let’s run a test:

A very simple image with a hello-world app, but in the Dockerfile, it runs as user ID 0 (root). You can try applying the above pod, and you will get error:

Error: container has runAsNonRoot and image will run as root

For an image that runs as another user ID:

If you apply this one, it would work, because in this image it runs as user ID 1000.

Best Practice

Restrict the containers that can run as privileged

Containers that run as privileged inherit all of the Linux capabilities assigned to root on the host. Containers seldom need these types of privileges to function properly. You can reject pods with containers configured to run as privileged.

However, there are a few pods in EKS that need some type of privilege, be it run as root or capabilities. For example, the coredns need NET_BIND_SERVICE, but it doesn’t need to run as root; while aws-node, kube-proxy requires different access to different paths on the host machine.

So the recommendation here is to create one policy for each pod, bind it to that service account so that each pod has exactly the required minimum set of permissions.

Do not run processes in containers as root

All containers run as root by default. This could be problematic if an attacker is able to exploit a vulnerability in the application and get shell access to the running container.

You can mitigate this risk in a variety of ways:

First, by removing the shell from the container image.

Second, adding the USER directive to your Dockerfile or running the containers in the pod as a non-root user. The Kubernetes podSpec includes a set of fields under spec.securityContext, that allow to let you specify the user and/or group to run your application as. These fields are runAsUser and runAsGroup respectively. You can mandate the use of these fields by creating a pod security policy.

Never run Docker in Docker or mount the socket in the container

While this conveniently lets you build/run images in Docker containers, you’re basically relinquishing complete control of the node to the process running in the container.

If you need to build container images inside Kubernetes, don’t. Or, either to use some building service or use build tools that don’t depend on docker daemon like Kaniko.

Restrict the use of hostPath

hostPath is a volume that mounts a directory from the host directly to the container. Rarely will pods need this type of access, but if they do, you need to be aware of the risks. By default pods that run as root will have write access to the file system exposed by hostPath. This could allow an attacker to modify the kubelet settings, create symbolic links to directories or files not directly exposed by the hostPath, e.g. /etc/shadow, install ssh keys, read secrets mounted to the host, and other malicious things. To mitigate the risks from hostPath, configure the spec.containers.volumeMounts as readOnly .

You should also use a pod security policy to restrict the directories that can be used by hostPath volumes. You can see examples in this repo.

Do not allow privileged escalation

Privileged escalation allows a process to change the security context under which it's running. Sudo is a good example of this as are binaries with the SUID or SGID bit. Privileged escalation is basically a way for users to execute a file with the permissions of another user or group. You can prevent a container from using privileged escalation with PodSecurityPolicy as well.

Summary

In order to make it work easier and out of the box, I create a GitHub repo with necessary policies, role bindings, EKS related pods, and for ingress controllers.

Detailed steps with the YAML files are documented here: https://github.com/IronCore864/ekspsp, and you can also follow the steps in the repo to set up everything you need.