In my last post, I covered pods. However, I also mentioned that I was a little bit torn about covering pods, and especially the creation of pods. The main reason for this is that we generally do not create pods on their own. There isn’t really anything wrong with creating pods manually as we did in the previous post. However, in doing so, we are missing out on a bit stuff when it comes to the functionality that Kubernetes offers us.
If we create pods manually, as covered in the last post, it means that if we want more than one instance of a pod, we have to manually deploy each one of these instances. This would not only mean that we would need to create multiple, almost identical YAML files, one for each one of the pods, as they need unique names. It would also mean that any changes, or maintenance to be performed, would include a lot of repetition and potential for mistakes.
Seeing that one of the main reasons behind running a Kubernetes cluster is the ability to have a resilient system with multiple load balanced instances of our pods, it makes a lot of sense that K8s would have a better solution built in. And it does! It’s called ReplicaSets. A ReplicaSet is an abstraction put in place specifically to maintain multiple instances of pods. A ReplicaSet continuously monitors the cluster to make sure that the desired number of pods are up and running at any given time. If it can’t see the correct number of pods running, it will make sure to correct this by adding or removing pods. And if we ever want to scale out our pod, all we have to do is to reconfigure the ReplicaSet (RS) and it will automatically reconfigure the cluster to meet the new requirements.
However, before we can start looking at ReplicaSets, we need to understand another Kubernetes feature called labels. Labels are key/value pairs that can be added as metadata to pretty much any resource in the Kubernetes cluster. These labels can then be used to find specific instances or groups of resources in the cluster for different reasons.
Why is this important? Well, in Kubernetes, resources don’t own other resources. For example, a pod created by a ReplicaSet is not owned by the RS. Instead, the RS uses the labels attached to pods to figure out what pods should be part of its “set”.
This allows for a very flexible configuration that doesn’t tie the user into any particular set up. Instead, it is up to the implementer of the system to figure out what labels and label values that makes sense to the current system. And to be honest, the label structure can vary wildly in different systems, as most systems have very different requirements and needs for grouping of resources.
Labels can be used for a lot of things. Not just for ReplicaSets. They can for example also be used to label nodes, allowing us to target specific nodes when scheduling pods. Some pods might need lots of memory, but very little disk. While some pods might need a lot of fast disks, but very little memory. By having differently configured nodes with labels defining the different types of resources available, we can make sure that we utilize the resources optimally by scheduling the right types of pods on the right kind of hardware.
Let’s have a look at how we can use labels with our pods. Imagine that you have the following two pod definitions
apiVersion: v1 kind: Pod metadata: name: my-pod-v1 labels: app: hello-world version: "1.0" spec: containers: - name: hello-world image: zerokoll/helloworld:1.0 --- apiVersion: v1 kind: Pod metadata: name: my-pod-v2 labels: app: hello-world version: "2.0" spec: containers: - name: hello-world image: zerokoll/helloworld:2.0
As you can see, it creates two pods. Both have the label
app with the value
hello-world. However, they differ in the second label as they have different
version labels based on the fact that the two pods are running two different versions of the image.
Note: All labels are strings, and can contain [a-z0-9A-Z] separated by dash (-), underscore (_) and dot (.). However, since we are using
2.0 as values in this case, they need to be wrapped in quotes to be handled as strings instead of numbers. Otherwise the API complains about the values not being strings.
With these two pods up and running, we can use the labels to filter our resultsets from ´kubectl´ by passing the
-l operator to our commands. For example, running
kubectl get pods -l app=hello-world,version=1.0
returns only the version 1.0 pod
NAME READY STATUS RESTARTS AGE my-pod-v1 1/1 Running 0 7m8s
-l operator, we can either use format used above, which is called equality-based format, or we can use a more expressive format called set-based. The equality-based format is basically just a set of key/value pairs that turn in to an AND-based select, returning only pods containing all the supplied labels with the supplied values. The set-based format on the other hand, allows us to use more complex queries like
kubectl get pods -l 'app in (hello-world), version in (1.0,2.0)'
which returns all pods with the
app label set to the value
hello-world and the
version label set to either
-l operator can be used with pretty much all
kubectl commands. For example, you could use it when deleting pods by running
kubectl delete pods -l app=hello-world,version=1.0
If you ever want to see what labels are set on resources in the cluster, you can include
--show-labels in the get command like this
kubectl get pods --show-labels NAME READY STATUS RESTARTS AGE LABELS hello-world-v1 1/1 Running 0 7m8s app=hello-world,version=1.0
Ok, with that little sidetrack into the world of labels completed, it is time to get back to our ReplicaSets, to see how they use labels to do their job!
Let’s have a look at a basic ReplicaSet spec
apiVersion: apps/v1 kind: ReplicaSet metadata: name: hello-world-v1 spec: replicas: 3 selector: matchLabels: app: hello-world version: '1.0' template: metadata: labels: app: hello-world version: '1.0' spec: containers: - name: hello-world image: zerokoll/helloworld
As you can see, it uses the
apps/v1 API version, the kind is
ReplicaSet, and in this case the name is set to
hello-world_v1. Besides that, it has a
spec that specifies that it should make sure that there is always 3 replicas of pods that match the specified label selector in the cluster. In this case, the selector specifies that it should match pods based on the
version labels, with the values
1.0. If there are less than 3 replicas that fit this selector up and running, we tell the RS how to create new pods by giving it a template element containing a template to be used when sheduling new pods. This template is the same thing that you would normally put in a pod specification file.
Just remember that the labels specified in the pod template metadata obviously has to satisfy the label selector for the ReplicaSet. If they don’t, it would cause the RS to create more and more pods, as the replica count would never increase, no matter how many new pods it created, as they would not match the selector used to get the count.
After deploying this RS to our cluster using the following command
kubectl apply -f .\rs-hello-world.yml
we can go ahead and get all resources in our cluster to see what has actually happened
kubectl get all NAME READY STATUS RESTARTS AGE pod/hello-world-v1-8kh7r 1/1 Running 0 12s pod/hello-world-v1-gwcdn 1/1 Running 0 12s pod/hello-world-v1-ht56v 1/1 Running 0 12s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 45d NAME DESIRED CURRENT READY AGE replicaset.apps/hello-world-v1 3 3 3 12s
As you can see in the above (poorly formatted) output, the creation of the
hello-world RS causes 4 resources to be created in the cluster. First of all, we get the defined ReplicaSet, but since this RS can’t find 3 pods that match the label selector, it automatically creates 3 new pods in the cluster, to make sure that the replica count matches the desired state.
To see the RS in action, we can try and execute the following command
kubectl delete pod hello-world-v1-8kh7r
This deletes one of the pods that the ReplicaSet created for us. However, if we run
kubectl get all NAME READY STATUS RESTARTS AGE pod/hello-world-v1-fcp8p 1/1 Running 0 37s pod/hello-world-v1-gwcdn 1/1 Running 0 2m34s pod/hello-world-v1-ht56v 1/1 Running 0 2m34s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 45d NAME DESIRED CURRENT READY AGE replicaset.apps/hello-world-v1 3 3 3 2m34s
we can see that there are still 3 pods up and running. But…if you look at the “AGE” column, you can see that one of the pods is much newer than the other ones.
Because the RS continuously monitors the cluster to make sure that the desired state is met, it sees that a pod is deleted, and immediately schedules another pod to be created to match the desired state in the cluster.
We can also try to create a new pod using the following spec
apiVersion: v1 kind: Pod metadata: name: my-pod labels: app: hello-world version: '1.0' spec: containers: - name: my-container image: zerokoll/helloworld
kubectl apply -f hello-world-pod.yml
As you can see, this pod specification sets the same set of labels that the RS is monitoring. So, if we run
kubectl get all NAME READY STATUS RESTARTS AGE pod/hello-world-v1-fcp8p 1/1 Running 0 7m28s pod/hello-world-v1-gwcdn 1/1 Running 0 9m25s pod/hello-world-v1-ht56v 1/1 Running 0 9m25s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 45d NAME DESIRED CURRENT READY AGE replicaset.apps/hello-world-v1 3 3 3 9m25s
we can see that there is no pod called
my-pod in the list of pods. The reason for this is that the ReplicaSet once again notices the addition of the extra pod, and as it sees that there are more than 3 pods matching the specified label selector, it deletes the pod straight away.
If you run the apply and get commands fast enough after eachother, you might get lucky and see the following output
kubectl get all NAME READY STATUS RESTARTS AGE pod/hello-world-v1-fcp8p 1/1 Running 0 9m2s pod/hello-world-v1-gwcdn 1/1 Running 0 10m pod/hello-world-v1-ht56v 1/1 Running 0 10m pod/my-pod 0/1 Terminating 0 2s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 45d NAME DESIRED CURRENT READY AGE replicaset.apps/hello-world-v1 3 3 3 10m
As you can see in the output above, the
my-pod pod is being terminated as soon as it starts.
A pretty cool effect by having ReplicaSets use label selectors instead of “owning” pods, is that we can have them adopt existing pods. Imagine that you are working on some new containers, and you schedule some pods to make sure that they work as they should. Once you are confident that the pods are behaving as they should, you can deploy a RS with a label selector that corresponds to the pods you just created. This will cause the new RS to automatically adopt the existing pods and start managing them without you having to first remove your pods and then have them being re-created by the newly create ReplicaSet.
It also means that if you for example have a misbehaving pod, you can just update the pod’s labels to make sure it doesn’t match the RS label selector. This will cause the RS to schedule a new pod to replace the misbehaving one that was “removed”, while leaving the misbehaving pod up and running for you to debug.
To change a label to make sure it doesn’t match the required label selector, you can run
kubectl label pod hello-world-v1-ht56v --overwrite app=hello-world-removed
If we fetch all resources after running that command
kubectl get all NAME READY STATUS RESTARTS AGE pod/hello-world-v1-8p2g2 1/1 Running 0 85s pod/hello-world-v1-fcp8p 1/1 Running 0 28m pod/hello-world-v1-gwcdn 1/1 Running 0 30m pod/hello-world-v1-ht56v 1/1 Running 0 30m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 45d NAME DESIRED CURRENT READY AGE replicaset.apps/hello-world-v1 3 3 3 30m
we can see that a 4th pod has been scheduled to keep the cluster state as desired. Leaving the
hello-world-v1-ht56v pod up and running, but not managed by the RS anymore.
Note: You can also delete a label adding a dash (-) after the label name. For example, to remove a label called
mylabel on a pod called
hello-world, you can run
kubectl label pods hello-world mylabel-.
Finally, if you delete a ReplicaSet, all the dependent pods are deleted by default as well. However, if you want to delete a RS without deleting the dependent pods, you can add
--cascade=false like this
kubectl delete rs hello-world-v1 --cascade=false
This removes the RS, but leaves all the pods it was managing up and running
kubectl get all NAME READY STATUS RESTARTS AGE pod/hello-world-v1-8p2g2 1/1 Running 0 52s pod/hello-world-v1-fcp8p 1/1 Running 0 52s pod/hello-world-v1-gwcdn 1/1 Running 0 52s pod/hello-world-v1-ht56v 1/1 Running 0 30m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 45d
Once you have a ReplicaSet up and running, you can scale the number of replicas in 2 ways. The fastest way is to run
kubectl scale --replicas=<replica count> rs/<RS name>. For example
kubectl scale --replicas=1 rs/hello-world-v1
In this case, that scales the replica count down to 1, leaving us with
kubectl get all NAME READY STATUS RESTARTS AGE pod/hello-world-v1-gwcdn 1/1 Running 0 37m pod/hello-world-v1-ht56v 1/1 Running 0 37m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 45d NAME DESIRED CURRENT READY AGE replicaset.apps/hello-world-v1 1 1 1 37m
As you can see, the RS removed all the extra pods, leaving us with 1 pod (except for the
hello-world-v1-ht56v pod, which is the one that has mismatching labels).
A word of caution though! Scaling manually like this is not recommended in most cases. However, it is very fast! So if you are in a pinch, you can run this command to scale up (or down) quickly. Just make sure that you update the corresponding YAML-file after doing it.
In most cases, you want to make sure that the YAML-files that you store in source control, represent the current state of the cluster. Otherwise, someone (or maybe a CD-pipeline) might re-deploy the RS spec from source control and overwrite the change you just made.
So a better solution to handle scaling changes is to update the
spec.replicas entry in the YAML-file, and re-apply the file.
Automatic horizontal scaling
You can also do automatic horizontal scaling, using something called a HorizontalPodAutoscaler. This allows you to get your pods automatically scaled horizontally based on CPU load. However, this is somewhat out of scope for this part of my introduction. But in short, you can run the following command to create a HorizontalPodAutoscaler
kubectl autoscale rs hello-world-v1 --max=5 --cpu-percent=70 NAME READY STATUS RESTARTS AGE pod/hello-world-v1-gwcdn 1/1 Running 0 49m pod/hello-world-v1-ht56v 1/1 Running 0 49m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 45d NAME DESIRED CURRENT READY AGE replicaset.apps/hello-world-v1 1 1 1 49m NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE horizontalpodautoscaler.autoscaling/hello-world-v1 ReplicaSet/hello-world-v1 <unknown>/70% 1 5 0 12s
As you can see, that creates a
horizontalpodautoscaler.autoscaling that will use the
hello-world-v1 RS to scale up to a maximum of 5 pods if the CPU goes above 70% in the existing ones.
If you are curious about this topic, you can read more about it here
Kubernetes also supports 2 other types of “sets”. And why would we need different kind of sets? Well, they solve slightly different problems.
The ReplicaSet that we have looked at in this post, will always try to make sure that there are the desired number of pods running in the cluster. Scheduling pods around the cluster based on node selectors and available resources. As we have seen…
But we also have something called a DaemonSet. This type of set will make sure that there is always a defined pod running on each one of the nodes in the cluster. This can be used for a few different reasons. For example, it can be used in scenarios when you need to interact with the actual node. Doing things like gathering node level resource usage etc. It can also be used to make sure that supporting pods, required by other pods in the cluster, is always available locally on the node that they are running. This means that you don’t have to leave the current node when making requests to these pods.
Finally, there is a type called StatefulSet. Pods in stateful set are treated a bit differently when if comes to scheduling, deletion and scaling. It is used specifically for pods that need to be run in stateful manner. However, as we try to run stateless workloads as much as we possibly can, these are not used nearly as much as ReplicaSets. Because of this, I won’t cover them in this intro post. But it can be worth noting that they exist, and that they can be used for running stateful things like databases etc in the cluster.
That’s it for part 3. I think I have covered everything I planned on covering in this post…
The fourth part of this series is availble here.