Why and How to set Probes in Kubernetes? Design a robust K8s cluster

It’s hard to design a robust K8s cluster with multiple inter-dependent services. Often if one of the core-service crashes, all services depending on it would fail…which is expected, but we should be able to pinpoint the service causing failures and restart/rollback without any manual efforts. K8s Liveness, Readiness probes and minReadySeconds setting come to rescue.
Let’s go through these settings one by one and see why, how, and in which combination to use them:
K8s Liveness Probes
The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite the bug. Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. Kubernetes provides liveness probes to detect and remedy such situations.
Liveness can be checked as:
- liveness command (
exec)
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5- liveness HTTP GET request (
httpGet)
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3- liveness TCP probe (
tcpSocket)
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20Restart policy:
Note: Restarting a container (it refreshes everything on the container) in a Pod should not be confused with restarting a Pod. A Pod is not a process, but an environment for running a container. A Pod persists until it is deleted.
A PodSpec has a restartPolicy field with possible values Always, OnFailure, and Never. The default value is Always. restartPolicy applies to all Containers in the Pod. restartPolicy only refers to restarts of the Containers by the kubelet on the same node. Exited Containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution. As discussed in the Pods document, once bound to a node, a Pod will never be rebound to another node.
Configure Probes in K8s
Probes have a number of fields that you can use to more precisely control the behavior of liveness and readiness checks:
initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to0seconds. Minimum value is0.periodSeconds: How often (in seconds) to perform the probe. Default to10seconds. Minimum value is1.timeoutSeconds: Number of seconds after which the probe times out. Defaults to1second. Minimum value is1.successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to1. Must be1for liveness. Minimum value is1.failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready. Defaults to3. Minimum value is1.
HTTP probes have additional fields that can be set on httpGet:
host: Host name to connect to, defaults to the pod IP. You probably want to set “Host” in httpHeaders instead.scheme: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to HTTP.path: Path to access on the HTTP server.httpHeaders: Custom headers to set in the request. HTTP allows repeated headers.port: Name or number of the port to access on the container. Number must be in the range1to65535.
For a TCP probe, the kubelet makes the probe connection at the node, not in the pod, which means that you can not use a service name in the host parameter since the kubelet is unable to resolve it.
K8s Readiness Probes
The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from the Service load balancers.
Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup or depend on external services after startup. In such cases, you don’t want to kill the application, but you don’t want to send it requests either.
Readiness probes are configured similarly to liveness. The only difference is that you use the readinessProbe field instead of the livenessProbe field.
Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and the containers are restarted when they fail.
K8s Startup Probes
The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don’t interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.
The minReadySeconds for Pod
The .spec.minReadySeconds is an optional field that specifies the minimum number of seconds for which a newly created Pod should be read without any of its containers crashing, for it to be considered available. This defaults to 0 (the Pod will be considered available as soon as it is ready).
Once a Pod’s container is started, it would not be considered to be ready (for accepting traffic) if it passes readinessProbe (if it has). Then, at the moment all of the Pod’s containers are ready, the Pod is considered to be ready. But, if .spec.minReadySeconds setting is defined for the Pod, then Pod would not still be considered ready until it passes seconds defined by .spec.minReadySeconds without any container crashing.
Here are some related interesting stories that you might find helpful:






