The provided content offers an introductory guide to Kubernetes for software developers, covering key concepts such as Pods, Deployments, ReplicaSets, and Namespaces, and includes practical examples and tools to facilitate learning and management of Kubernetes clusters.
Abstract
The article "Learn the Fundamentals of Kubernetes for Software Developers" serves as a comprehensive primer for developers looking to understand and utilize Kubernetes in their work. It begins by defining Kubernetes as an open-source container orchestration system that simplifies the deployment, scaling, and management of containerized applications. The article emphasizes the importance of Kubernetes in modern software development practices and its availability across major cloud providers. It introduces essential Kubernetes components, including Pods, the smallest deployable units, and Deployments, which manage Pods and ensure self-healing, automated rollouts, and rollbacks. The concept of ReplicaSets is explained as a means to maintain a stable set of replica Pods. The article also discusses Namespaces, which allow for the division of a single physical cluster into multiple virtual clusters for better resource management and organization. Practical examples are provided throughout, demonstrating how to install and use tools like minikube for local Kubernetes clusters, kubectl for cluster interaction, and k9s for monitoring cluster objects. Additionally, the article offers insights into advanced topics such as resource quotas and limit ranges, and concludes with a mention of related articles for further learning.
Opinions
The author suggests that developers should not install a Kubernetes cluster in their institution for learning purposes but should use tools like minikube to create a local cluster.
The article conveys the opinion that understanding how a Pod is created is crucial for grasping the concept of Deployments, which is a key feature of Kubernetes.
It is implied that ReplicaSets are a superior method for managing Pod replicas compared to direct Pod management due to their ability to automatically replace failed instances.
The author expresses that Deployments are preferred over ReplicaSets for managing containerized applications because they provide additional benefits such as automated rollouts and rollbacks.
The use of Namespaces is presented as a best practice for organizing resources and managing environments within a Kubernetes cluster, suggesting that this helps in maintaining stability and resource control.
The author recommends enabling kubectl autocompletion to enhance efficiency and reduce the likelihood of typos when executing commands.
The tool k9s is highly recommended for its user-friendly interface and real-time monitoring capabilities, indicating that it can significantly improve the experience of managing a Kubernetes cluster.
Learn the Fundamentals of Kubernetes for Software Developers
Let’s get started with the mysterious k8s
Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of containerized applications. Kubernetes is commonly referred to as k8s for simplicity since there are 8 characters between k and s. Kubernetes is widely used in the modern software development practice and is available in all major cloud providers such as Google Cloud Platform (GCP) and Amazon Web Service (AWS).
Photo by Samuel Sianipar on Unsplash.
Compared to traditional monolithic and virtualized deployment practices, Kubernetes brings revolutionary changes, including self-healing, automated rollouts and rollbacks, automatic resource management, secret and configuration management, service discovery and load balancing, and storage orchestration, etc. If you want to learn more about the introduction of Kubernetes, you can check the official document of Kubernetes. In this post, I will give hands-on examples of Kubernetes and explain the most important concepts for software developers such as Pod, Deployment, Namespace, etc. Command-line examples and YAML configuration files will also be explained with easy-to-understand terms. Besides, there are two bonuses for using Kubernetes in this post which can be very helpful for you as a software developer, don’t miss them :-).
As a software developer, you normally don’t need to install a Kubernetes cluster in your institution by yourself. This should be done by your system administrator or provided by your cloud provider. However, for learning purposes, it’s better not to mess with the production cluster. Luckily, there is a very convenient tool called minikube which can spin up a local Kubernetes cluster on your laptop/desktop quickly. Minikube is perfect for learning, development, or testing purposes.
On Linux, the commands to install minikube are:
Then you can start up your cluster by:
$ minikube start
To interact with Kubernetes, you need to install kubectl. On Linux, the commands are:
To verify that kubectl is installed properly and is connected to your cluster (which should be done automatically) run:
$ kubectl version
We shall see the client and server versions displayed on the screen, which means everything is set up and we can start our journey with Kubernetes.
Pods
Pods are the smallest deployable units that you can create and manage in Kubernetes. A Pod is a group of one or more containers designed to be scheduled and run together. The containers are normally Docker containers which are standard units of software that package up code and all the dependencies. These containers share the same storage and network resources and act as if they are on the same “localhost”. Usually, you don’t need to create Pods directly but would create them indirectly with workload resources such as Deployment. However, as a beginner, it is helpful to know how a Pod is created so you can better understand what a Deployment is, one of the most important concepts in Kubernetes.
To create a Pod, we need to specify the configuration manifest in a YAML file, which has the following content:
Explanations for each field:
apiVersion — which version of the Kubernetes API to be used to create the Pod object. kubectl interacts with the Kubernetes cluster through kube-apiserver. Under the hood, all actions are done by REST API calls. Therefore, you need to specify the apiVersion field for all the Kubernetes resources. A list of apiVersion can be found in this link.
kind — what type of object to create. Here we will create a Pod object, later we see other types of objects.
metadata — the metadata for the object which helps uniquely identify the object, including the name and labels. The labels are commonly used to select objects in Kubernetes and have the key:value format. An object can have multiple labels.
spec — the desired state for the object. The format for spec is different for each Kubernetes object and can contain nested spec fields. The most common scenario is that the spec of a Deployment contains a spec for the Pods contained in it.
spec.containers — the containers to run in the Pod. The value for this field is a list which means there can be multiple containers running in the same Pod. We need to specify the name, image, and ports for each container.
To create a Pod from the configuration YAML file, run:
$ kubectl apply -f pod.yaml
pod/nginx-pod created
The Pod is created asynchronously. If you check the Pod with the get command immediately after it’s created, you will probably see that it’s being created:
$ kubectl get pods
nginx-pod 0/1 ContainerCreating 0 4s
You can use the describe command to check the details of the Pod, which include all the statuses and events for the Pod.
$ kubectl describe pod ngins-pod
To delete a Pod, run:
$ kubectl delete pod nginx-pod
ReplicaSet
As mentioned above, we rarely create Pods directly but would do so with some workload resources. ReplicaSet is one of the workload resources. It is a definition of how many replicas of a Pod should be run at the same time. When a Pod replica fails, a new one will be created by the ReplicaSet to main the same number of replicas for the Pod all the time.
This is the YAML file for creating a ReplicaSet for the Pod demonstrated above:
Note for important fields:
apiVersion and kind need to be updated accordingly for the ReplicaSet.
metadata.labels — these are the labels for the ReplicaSet, don’t get them confused with the labels for the Pod below.
spec.replicas — how many replicas of the Pod to maintain.
spec.selector — the selector selects which Pod to be used in the ReplicaSet. It uses matchLabels to match the labels of Pods and select the matched one to be used in this ReplicaSet. You can select a Pod not defined in the current ReplicaSet, but it’s highly discouraged because it will introduce confusion or even error for the Kubernetes cluster. Check the documentation for labels and selectors for more details.
spec.template — specify the template for the Pod. As you can see, the template field is the same as the Pod definition above, without the apiVersion and kind fields which are handled by the ReplicaSet. Importantly, the Pod labels defined in spec.template.metada.labels are used byspec.selector.matchLabels above. The nested spec is the spec for the Pod, which defines desired state for the Pod.
Let’s create a ReplicaSet with this YAML file:
$ kubectl apply -f replicaset.yaml
replicaset.apps/nginx-replicaset created
As with Pods, the ReplicaSet can also be checked with the get command:
$ kubectl get replicasets.app
NAME DESIRED CURRENT READY AGE
nginx-replicaset 3338s
We can verify that there are three replicas for the same Pod running at the same time, each has a unique identifier:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-replicaset-bzd2q 1/1 Running 012s
nginx-replicaset-kk2l9 1/1 Running 012s
nginx-replicaset-lv5hc 1/1 Running 012s
Let’s now delete a Pod to see what will happen:
$ kubectl delete pod nginx-replicaset-bzd2q
pod "nginx-replicaset-bzd2q" deleted
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-replicaset-kk2l9 1/1Running04m34s
nginx-replicaset-lv5hc 1/1Running04m34s
nginx-replicaset-nkmjk 1/1Running03s
We see that a new Pod is created which keeps the total number of replicas as 3.
However, if you change the image for the Pod, the replicas won’t be recreated. Try it yourself if you don’t believe it. For example, try to change nginx:1.19.10-alpine to nginx:1.20.0-alpine. Here is where Deployment shines.
A Deployment is a workload resource in Kubernetes which provides declarative management for Pods and ReplicaSets. Creating a Deployment in Kubernetes automatically creates a ReplicaSet, which then creates the Pod replicas defined by the template. Therefore, we should normally not manage ReplicaSets or Pods directly, as the Deployment will manage them for us automatically. Besides the self-healing feature of ReplicaSet, Deployment also has the benefit of automated rollouts and rollbacks. Rollout means that if the Pod template is updated, for example, the image for a container is updated, a new ReplicaSet will be created by gradually starting new Pods and deleting old ones and thus keeping the total number of running Pods constant. On the other hand, rollback means we can roll back to a previous Deployment if we are unhappy about the current one. The number of Pods will also be kept constant during the rollback process and maintain the stability of the system with no downtime.
This is the YAML file for creating a Deployment based on the ReplicaSet and Pod definitions introduced previously:
You see everything is the same as the YAML file for ReplicaSet except the kind field which defines the resource type. It should be Deployment here since we are creating a Deployment rather than a ReplicaSet.
Let’s create a Deployment based on this YAML file:
$ kubectl apply -f deployment.yaml
deployment.apps/nginx-deployment created
Now let’s check the Deployment, ReplicaSet and Pods created:
$ kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 3/3335s
$ kubectl get replicasets.apps
NAME DESIRED CURRENT READY AGE
nginx-deployment-7484cf4875 3338s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-7484cf4875-dmrkj 1/1Running011s
nginx-deployment-7484cf4875-qh9bb 1/1Running011s
nginx-deployment-7484cf4875-zjplx 1/1Running011s
Pay close attention to the names of the Deployment, ReplicaSet, and Pods. There is an obvious pattern that the Deployment comprises the ReplicaSet which then comprises the Pod replicas. The name of the ReplicaSet is in the format of [DEPLOYMENT]-[RANDOM-STRING]. The random string 7484cf4875 is generated using the hash of the Pod template as the seed, which means that when the Pod template is changed, the ReplicaSet will be changed. Let’s see it in practice.
Let’s change the image from nginx:1.19.10-alpine to nginx:1.20.0-alpine.
$ kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 3/33324m
$ kubectl get replicasets.apps
NAME DESIRED CURRENT READY AGE
nginx-deployment-7484cf4875 0 0 0 25m
nginx-deployment-95d596f7b 3 3 3 6m29s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-95d596f7b-c5xhf 1/1Running0105s
nginx-deployment-95d596f7b-gzhjm 1/1Running03m
nginx-deployment-95d596f7b-kkg69 1/1Running0106s
If you run the commands quickly enough, you will be able to see the transition states of the ReplicaSet and Pods, where the new ReplicaSet and Pods are created and the old ones are still in function, as indicated by the READY, UP-TODATE, AVAILABLE, DESIRED, CURRENT, READY, etc, fields.
The results above show that a new ReplicaSet with a new random string 95d596f7b is created, and a new set of Pod replicas are created for that ReplicaSet. The new random string is generated using the hash of the new Pod template as the feed.
To let you have a better understanding of the random string of the ReplicaSet, let’s change the image back to nginx:1.19.10-alpine. This time let’s change it on the command line:
$ kubectl set image deployment/nginx-deployment nginx=nginx:1.19.10-alpine
deployment.apps/nginx-deployment image updated
$ kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 3/33333m
$ kubectl get replicasets.apps
NAME DESIRED CURRENT READY AGE
nginx-deployment-7484cf4875 3 3 3 33m
nginx-deployment-95d596f7b 0 0 0 15m
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-7484cf4875-26msj 1/1Running011s
nginx-deployment-7484cf4875-snk2t 1/1Running014s
nginx-deployment-7484cf4875-zmxxj 1/1Running012s
Interestingly, the old ReplicaSet with string 7484cf4875 is running now, so are the Pod replicas. This proves that the random string for the ReplicaSet is generated using the Pod template as the feed. When the Pod template is the same, the random string will be the same, which in turn results in the same ReplicaSet.
We can check the rollout status of the deployment by:
$ kubectl rollout status deployment nginx-deployment
deployment "nginx-deployment" successfully rolled out
It shows that the deployment has been successfully rolled out. As we are using a minimal alpine image for demonstration purposes, the rollout is very fast. If you are using a larger image, or have a slower cluster, or just check quickly enough, you can see the intermediate state:
Waiting for rollout to finish: 2 out of 3new replicas have been updated...
We can check the rollout history of the Deployment by:
There is not much information for each rollout. We can check the details of each rollout version by:
$ kubectl rollout history deployment nginx-deployment --revision=2
Let’s add an annotation to the latest revision with the kubernetes.io/change-cause key. Note that the annotation can only be added to the latest version of the deployment. It will be added to the CHANGE-CAUSE field above and make the revisions easier to be identified:
Namespaces are virtual clusters existing in the same physical cluster. Namespaces provide a scope for resource names, which must be unique within a namespace, but not across namespaces. We can use namespaces to subdivide our cluster according to different purposes, for example for production vs staging environments.
Up to now, we haven’t specified any namespace in our commands. However, this doesn’t mean that we are not using a namespace. Actually we have been using the default namespace implicitly. We can check the current namespaces by:
$ kubectl get namespaces
NAME STATUS AGE
default Active 7d17h
kube-node-lease Active 7d17h
kube-public Active 7d17h
kube-system Active 7d17h
default is the default namespace we have been using. The other namespaces with the kube- prefix are systematic namespaces used by Kubernetes. We should avoid using them in our work.
We can explicitly specify the namespace when we access some resource, for example:
$ kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 3/3337h44m
$ kubectl get deployments.apps --namespace default NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 3/3337h44m
Let’s create a new namespace for the production environment:
$ kubectl create namespaceproduction
We can also create namespaces with a YAML file, which can be better for version control of the code:
Then the Namespace resource can be created like any other resource:
Let’s now create a Deployment in the production and staging namespaces, respectively. For simplicity, we will use the command line and specify different names for them:
$ kubectl create deployment production-deployment --image=nginx:1.21.3-alpine --replicas=3 --namespace=production
deployment.apps/production-deployment created
$ kubectl create deployment staging-deployment --image=nginx:1.21.3-alpine --replicas=3 --namespace=staging
deployment.apps/staging-deployment created
Then we can check the Deployments in each namespace:
$ kubectl get deployments.apps --namespace=default NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 3/3337h58m
$ kubectl get deployments.apps --namespace=production NAME READY UP-TO-DATE AVAILABLE AGE
production-deployment 3/333104s
$ kubectl get deployments.apps --namespace=staging NAME READY UP-TO-DATE AVAILABLE AGE
staging-deployment 3/33357s
We can see that the Deployment created in each namespace can only be found in the respective namespaces.
As we have seen, namespaces provide logical separation for the resources. But hey can do much more than that. We can specify different Resource Quotas and Limit Ranges in each namespace which define how much resource in total can be used and how much for each resource in the namespace. However, this is quite advanced usage and is out of the scope of this post. Luckily, as a developer, you normally don’t have to do it yourself and can just use the namespaces assigned by your system administrators.
Bonus — Autocompletion
As you may have already noticed, it’s quite cumbersome to type the commands for kubectl. You have to remember the resource types and options and it’s very easy to have typos. Luckily, kubectl has an autocompletion feature that can save you a lot of typing. To enable autocompletion in Bash, you need to run the following commands:
Now you should have autocompletion enabled for kubectl. Try to press TAB when you enter the following commands, you will have the fancy autocompletion for the resources and options now:
$ kubectl get deployments.apps --namespace=production
Bonus — k9s
There is a very fancy tool for monitoring the objects in a Kubernetes cluster called k9s. With k9s, we can monitor the different types of objects created as well the system resources like CPU and memory in a user-friendly console. We can even ssh to the Pods in k9s and run some commands in them directly.
You can install k9s on your computer according to the instructions specified here. For Linux, you can install k9s with snap, or from the source code. From the source code, you will install the latest version of k9s, but normally an older but stable version with snap.
With snap:
$ sudo snap install k9s
From the source code:
# Get the go repo
$ git clonehttps://github.com/derailed/k9s.git
$ cd k9s$ make build$ sudo cp ./execs/k9s /usr/local/bin/
By default, k9s uses the same credentials as kubectl. In our example here, we can use k9s to access our cluster directly because there is no special authentication for it. To open the console for k9s, run:
# To check a specific namespace.
$ k9s --namespace=production
# Tocheckall the namespecs.
$ k9s --all-namespaces
You will see the charming interface of k9s opened (assuming you have specified --all-namespaces):
K9s
You can press the shortcuts in the red boxes indicated in the picture above. For detailed explanations for each key binding, please check this link. If you need to check your Kubernetes resources in your work, do try out k9s, which is really cool and convenient.
We have covered the fundamentals of Kubernetes for software developers in this post. Now you should have a better understanding of Kubernetes and can start to create deployments in your project. However, this is just an introductory post and didn’t touch on the more advanced topics such as Service, Volume, ConfigMap, Secret, etc. There will be dedicated posts for each of these topics. There will also be a practical post for deploying and configuring Airflow on a Kubernetes cluster, which can be used to schedule and run our Python jobs on the cluster.