4 min read

K8s Scheduling: Schedule Pods Based on Resource Needs.

K8s Scheduling: Schedule Pods Based on Resource Needs.

Being an Orchestration engine, it is incumbent upon K8s to ensure Pods are running at all times. Barring a mistake in a Deployment/Pod manifest (or other human induced errors) , there is no excuse for it/them to not be up and running.

In this regard, the kube-scheduler is a vital part of the K8s ecosystem. It is a Control Plane component and is always looking for Pods that should be running on a Node(s) but are not. When such Pods are found (such Pods don't have a Node name in their object), the scheduler jumps into action and makes it a mission to find a home for them.

Table of Contents

  1. A High Level View of the Scheduling Flow
  2. Demo: Scheduling 3 Pods on a 4 Node Cluster
  3. Demo: Scheduling 6 Pods on a 4 Node Cluster
  4. Demo: Scheduling Pods with Resource Requests

A High Level View of the Scheduling Flow

Figure 1: It all starts with the kube-scheduler.
  • kube-scheduler finds unassigned Pods (i.e. Pods that have no Node name in their object model).
  • kube-scheduler selects the right Node for the unassigned Pod(s)
📢
SELECTION = Filtering out Nodes that are not suited for the Pod + Scoring Nodes that are + Binding Pod(s) to the Node with the highest score.
  • kube-scheduler updates API server with the name of the Node that can handle the Pods needs
  • kubelet, running on Nodes, see new Pod(s) have been assigned to its Node
  • kubelet tells container runtime to download and start up the container images

For demos in this article, we will assume a cluster with 1 Control Plane Node and 3 Worker Nodes.

Node Type IP
Control Plane 192.168.0.214
Worker Node 1 192.168.0.96
Worker Node 2 192.168.0.205
Worker Node 3 192.168.0.186

Demo: Scheduling 3 Pods on a 4 Node Cluster

Step 1: Deploy the manifest with 3 replicas

Figure 2: One Pod was deployed on each Node (except on the Control Plane)

kube-scheduler filtered out the Node which could NOT host the Pods (in this case, the Control Plane) and placed on Pod each on one of the 3 Worker Nodes.

💡
Control Plane Nodes are typically not used for application Pods deployment.

Demo: Scheduling 6 Pods on a 4 Node Cluster

Following on from our previous example, if we were to scale up our replicas to 6, from the 3 it currently is, what would we observe about Pod deployments per Node?

Step 1: Scale up replicas to 6

Figure 3: With 6 replices to be distributed across 3 Nodes, each gets 2 Pods deployed.

Demo: Scheduling Pods with Resource Requests

In this demo, we will add a resource request in our Pod spec and examine the outcomes as far as Pod deployment is concerned.

Step 1: Deploy the manifest provided for this demo

Figure 4: The manifest has asked for each Pod to get 1 whole CPU allocated to it.

Step 2: Display Pods using $ kubectl get pods -o wide

Figure 5: Each Pod is neatly allocated to one Node, thereby fulfilling the ask for 1 CPU per Pod.

Step 3: Increase replicas to 6

Increase replicase from 3 to 6 by typing $ kubectl scale deployment hello-world-requests --replicas=6.

Figure 5: We have our first taste of a failed deployment. 3 Pods were deployed and 3 were not. Why?

To answer the 'Why', lets look at Node ip-192-168-0-186 (or any of the other 2 for that matter - the deduction in all cases will be the same).

Observation # 1: The Node has 2 CPUs

Figure 6: The Node has 2 allocatable CPUs.

Observation # 2: One of the CPU's has been allocated to 1 of the 2 Pods that were scheduled for this Node.

Figure 7: 1 of the CPU's is already allocated to 1 of the 2 Pods that were to be deployed to this Node, leaving 1 CPU for remaining compute needs.

Observation # 3: The calico-node-hntd6 Pod has a request for 250 millicore of a CPU.

Figure 8: The calico Pod on this Node also required 250 millicore of a CPU (and this allocation was done WAY before there were any pending Deployments).

The remaining CPU capacity of 750 millicore is not sufficient for any other Pod (from this Deployment, at least) and therefore, after the first 3 are deployed, and kube-scheduler comes back to the remaining 3, there is not enough CPU for them.

This lack of resources puts the Pods in a PENDING state and till resources are increased OR Pod requests are lowered, they will stay in this state.


I write to remember and if in the process, I can help someone learn about Containers, Orchestration (Docker Compose, Kubernetes), GitOps, DevSecOps, VR/AR, Architecture, and Data Management, that is just icing on the cake.