Goal: Master Kubernetes workload types and advanced scheduling techniques for production SRE/DBRE scenarios.
Prerequisites: Complete the Kubernetes Foundations Tutorial or equivalent knowledge of Pods, Deployments, and Services.
| Workload Type | Purpose | Use Cases |
|---|---|---|
| ReplicaSet | Maintain N replicas | Base for Deployments |
| DaemonSet | One pod per node | Logging, monitoring agents |
| Job | Run to completion | Batch processing, migrations |
| CronJob | Scheduled jobs | Backups, reports, cleanup |
# Start your cluster with multiple nodes (for DaemonSet testing)
# Option 1: Using kind for multi-node setup
cat <<EOF > kind-multi-node-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker
EOF
# Create the cluster
kind create cluster --name workloads-lab --config kind-multi-node-config.yaml
# Verify nodes
kubectl get nodes
Alternative: Using minikube for multi-node setup
# Option A: Start fresh cluster with multiple nodes
# If you have an existing cluster, delete it first:
minikube delete
# Start minikube with multiple nodes (works only for new clusters)
# This creates 1 control-plane + 2 worker nodes (3 total)
minikube start --nodes 3
# Option B: Add nodes to existing cluster
# If you already have a running cluster, add additional nodes:
# This adds 2 worker nodes to existing control-plane (3 total nodes)
minikube node add
minikube node add
# Verify nodes
kubectl get nodes
Expected Output (kind):
NAME STATUS ROLES AGE VERSION
workloads-lab-control-plane Ready control-plane 1m v1.28.0
workloads-lab-worker Ready <none> 1m v1.28.0
workloads-lab-worker2 Ready <none> 1m v1.28.0
workloads-lab-worker3 Ready <none> 1m v1.28.0
Expected Output (minikube):
NAME STATUS ROLES AGE VERSION
minikube Ready control-plane 2m47s v1.34.0
minikube-m02 Ready <none> 2m26s v1.34.0
minikube-m03 Ready <none> 2m9s v1.34.0
A ReplicaSet ensures that a specified number of pod replicas are running at any time. While you typically use Deployments (which manage ReplicaSets), understanding ReplicaSets is crucial for troubleshooting.
Create nginx-replicaset.yaml:
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: nginx-replicaset
labels:
app: nginx
tier: frontend
spec:
replicas: 3
selector:
matchLabels:
app: nginx
tier: frontend
template:
metadata:
labels:
app: nginx
tier: frontend
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
# Apply the ReplicaSet
kubectl apply -f nginx-replicaset.yaml
# Watch the pods being created
kubectl get pods -l app=nginx --watch
# Check ReplicaSet status
kubectl get replicasets
kubectl describe replicaset nginx-replicaset
Expected Output:
NAME DESIRED CURRENT READY AGE
nginx-replicaset 3 3 3 30s
# Get current pods
kubectl get pods -l app=nginx
# Delete one pod
kubectl delete pod <pod-name>
# Watch ReplicaSet recreate it immediately
kubectl get pods -l app=nginx --watch
What happens: ReplicaSet detects the missing pod and creates a new one to maintain the desired count of 3.
# Scale up to 5 replicas
kubectl scale replicaset nginx-replicaset --replicas=5
# Verify scaling
kubectl get pods -l app=nginx
kubectl get replicaset nginx-replicaset
# Scale down to 2 replicas
kubectl scale replicaset nginx-replicaset --replicas=2
# Watch pods being terminated
kubectl get pods -l app=nginx --watch
# Try to update the image
kubectl set image replicaset/nginx-replicaset nginx=nginx:1.22
# Check the ReplicaSet
kubectl describe replicaset nginx-replicaset
# Notice: Existing pods are NOT updated!
kubectl get pods -l app=nginx -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}'
Key Insight: ReplicaSets don’t perform rolling updates. You need to manually delete pods for them to pick up the new image. This is why we use Deployments!
# Delete all pods to force recreation with new image
kubectl delete pods -l app=nginx
# Now they'll use the new image
kubectl get pods -l app=nginx -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}'
Create advanced-replicaset.yaml:
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: advanced-replicaset
spec:
replicas: 3
selector:
matchLabels:
app: myapp
matchExpressions:
- key: environment
operator: In
values:
- production
- staging
- key: tier
operator: NotIn
values:
- backend-legacy
template:
metadata:
labels:
app: myapp
environment: production
tier: frontend
spec:
containers:
- name: app
image: nginx:1.21
ports:
- containerPort: 80
# Apply the advanced ReplicaSet
kubectl apply -f advanced-replicaset.yaml
# Check the pods
kubectl get pods -l app=myapp
# Try to create a pod that matches the selector
kubectl run manual-pod --image=nginx --labels="app=myapp,environment=staging,tier=frontend"
# The ReplicaSet will try to manage it!
kubectl get pods -l app=myapp
kubectl delete replicaset nginx-replicaset advanced-replicaset
kubectl delete pod manual-pod
A DaemonSet ensures that all (or some) nodes run a copy of a pod. As nodes are added to the cluster, pods are automatically added to them.
Create node-info-daemonset.yaml:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-info
labels:
app: node-info
spec:
selector:
matchLabels:
app: node-info
template:
metadata:
labels:
app: node-info
spec:
containers:
- name: node-info
image: busybox
command:
- sh
- -c
- |
while true; do
echo "Node: $NODE_NAME"
echo "Pod: $POD_NAME"
echo "Namespace: $POD_NAMESPACE"
sleep 30
done
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
memory: "32Mi"
cpu: "50m"
limits:
memory: "64Mi"
cpu: "100m"
# Apply the DaemonSet
kubectl apply -f node-info-daemonset.yaml
# Check DaemonSet status
kubectl get daemonsets
kubectl get pods -l app=node-info -o wide
# Verify one pod per node
kubectl get nodes
kubectl get pods -l app=node-info -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName
Expected Output:
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
node-info 3 3 3 3 3 <none> 30s
NAME NODE
node-info-abc12 <your-first-worker-node>
node-info-def34 <your-second-worker-node>
node-info-ghi56 <your-third-worker-node>
Create fluentd-daemonset.yaml:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
labels:
app: fluentd
tier: logging
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
tier: logging
spec:
# Important: DaemonSets often need elevated permissions
serviceAccountName: fluentd
tolerations:
# Allow running on control-plane nodes
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
containers:
- name: fluentd
image: fluent/fluentd:v1.16-1
resources:
requests:
memory: "200Mi"
cpu: "100m"
limits:
memory: "500Mi"
cpu: "500m"
volumeMounts:
# Mount host logs
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
Create the ServiceAccount first:
# Create ServiceAccount
kubectl create serviceaccount fluentd -n kube-system
# Create ClusterRole and ClusterRoleBinding
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluentd
rules:
- apiGroups: [""]
resources:
- pods
- namespaces
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluentd
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluentd
subjects:
- kind: ServiceAccount
name: fluentd
namespace: kube-system
EOF
# Apply the DaemonSet
kubectl apply -f fluentd-daemonset.yaml
# Check status
kubectl get daemonsets -n kube-system
kubectl get pods -n kube-system -l app=fluentd -o wide
# Check current image
kubectl get daemonset node-info -o jsonpath='{.spec.template.spec.containers[0].image}'
# Update the image
kubectl set image daemonset/node-info node-info=busybox:1.36
# Watch the rolling update
kubectl rollout status daemonset/node-info
# Check rollout history
kubectl rollout history daemonset/node-info
Create monitoring-daemonset.yaml:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: monitoring-agent
spec:
selector:
matchLabels:
app: monitoring-agent
template:
metadata:
labels:
app: monitoring-agent
spec:
# Only run on nodes with this label
nodeSelector:
monitoring: "enabled"
containers:
- name: agent
image: busybox
command: ["sh", "-c", "echo Monitoring node && sleep 3600"]
resources:
requests:
memory: "50Mi"
cpu: "50m"
# First, get your node names (they differ between kind and minikube)
kubectl get nodes
# Apply the DaemonSet
kubectl apply -f monitoring-daemonset.yaml
# Check pods - should be 0 initially
kubectl get pods -l app=monitoring-agent
# Label a node to enable monitoring (replace with your actual node name)
# For kind clusters: workloads-lab-worker, workloads-lab-worker2, etc.
# For minikube: minikube-m02, minikube-m03, etc.
# Example: kubectl label node <your-first-worker-node> monitoring=enabled
kubectl label node $(kubectl get nodes -o jsonpath='{.items[1].metadata.name}') monitoring=enabled
# Watch pod get created
kubectl get pods -l app=monitoring-agent -o wide
# Label another node (replace with your actual node name)
kubectl label node $(kubectl get nodes -o jsonpath='{.items[2].metadata.name}') monitoring=enabled
# Now you have 2 pods
kubectl get pods -l app=monitoring-agent -o wide
# Remove label (replace with your actual node name)
kubectl label node $(kubectl get nodes -o jsonpath='{.items[1].metadata.name}') monitoring-
kubectl get pods -l app=monitoring-agent -o wide
| Feature | ReplicaSet | DaemonSet |
|---|---|---|
| Pod Count | Fixed number (e.g., 3) | One per (matching) node |
| Scaling | Manual or HPA | Automatic with nodes |
| Use Case | Application replicas | System services |
| Scheduling | Kube-scheduler | Per-node |
| Example | Web servers, APIs | Logs, monitoring |
A Job creates one or more pods and ensures they successfully complete. Unlike Deployments, Jobs are meant to run to completion.
Create simple-job.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: pi-calculation
spec:
template:
spec:
containers:
- name: pi
image: perl:5.34
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
# Create the job
kubectl apply -f simple-job.yaml
# Watch job progress
kubectl get jobs --watch
# Check pods
kubectl get pods -l job-name=pi-calculation
# View output
kubectl logs $(kubectl get pods -l job-name=pi-calculation -o jsonpath='{.items[0].metadata.name}')
# Check job details
kubectl describe job pi-calculation
Expected Output:
NAME COMPLETIONS DURATION AGE
pi-calculation 1/1 5s 10s
Create parallel-job.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: parallel-processing
spec:
completions: 10
parallelism: 3
template:
spec:
containers:
- name: worker
image: busybox
command:
- sh
- -c
- |
echo "Worker starting: $HOSTNAME"
sleep $((RANDOM % 10 + 5))
echo "Worker completed: $HOSTNAME"
restartPolicy: Never
backoffLimit: 6
# Create the parallel job
kubectl apply -f parallel-job.yaml
# Watch it process (3 pods at a time)
kubectl get pods -l job-name=parallel-processing --watch
# Check job status
kubectl get job parallel-processing
# View logs from different workers
kubectl logs -l job-name=parallel-processing --prefix=true
Create db-backup-job.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: mariadb-backup
spec:
template:
spec:
containers:
- name: backup
image: mariadb:10.11
command:
- sh
- -c
- |
echo "Starting backup at $(date)"
mysqldump -h mariadb -u root -prootpassword --all-databases > /backup/backup-$(date +%Y%m%d-%H%M%S).sql
echo "Backup completed at $(date)"
ls -lh /backup/
volumeMounts:
- name: backup-storage
mountPath: /backup
env:
- name: MYSQL_PWD
value: "rootpassword"
restartPolicy: OnFailure
volumes:
- name: backup-storage
emptyDir: {}
backoffLimit: 3
ttlSecondsAfterFinished: 100
# Apply the backup job
kubectl apply -f db-backup-job.yaml
# Check status
kubectl get jobs
kubectl get pods -l job-name=mariadb-backup
# View logs
kubectl logs -l job-name=mariadb-backup
Create cleanup-cronjob.yaml:
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-cleanup
spec:
schedule: "0 2 * * *" # Run at 2 AM daily
jobTemplate:
spec:
template:
spec:
containers:
- name: cleanup
image: busybox
command:
- sh
- -c
- |
echo "Starting cleanup at $(date)"
echo "Cleaning old files..."
echo "Cleanup completed at $(date)"
restartPolicy: OnFailure
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
# Apply the CronJob
kubectl apply -f cleanup-cronjob.yaml
# Check CronJob
kubectl get cronjobs
kubectl describe cronjob nightly-cleanup
Create frequent-cronjob.yaml:
apiVersion: batch/v1
kind: CronJob
metadata:
name: every-minute-test
spec:
schedule: "*/1 * * * *" # Every minute
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
command:
- sh
- -c
- echo "Hello from CronJob at $(date)"
restartPolicy: OnFailure
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
# Apply the frequent CronJob
kubectl apply -f frequent-cronjob.yaml
# Wait a minute and check
sleep 70
kubectl get cronjobs
kubectl get jobs -l parent-cronjob=every-minute-test
# Check logs (newest job pod)
kubectl logs $(kubectl get pods -l job-name --sort-by=.metadata.creationTimestamp -o jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | tail -1)
# Manually trigger a CronJob
kubectl create job manual-test --from=cronjob/every-minute-test
# Suspend the CronJob
kubectl patch cronjob every-minute-test -p '{"spec":{"suspend":true}}'
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12)
# │ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday)
# │ │ │ │ │
# * * * * *
Common schedules:
"0 0 * * *" # Daily at midnight
"0 */6 * * *" # Every 6 hours
"*/15 * * * *" # Every 15 minutes
"0 9 * * 1-5" # Weekdays at 9 AM
"0 0 1 * *" # First day of month
Every container should specify:
Create resource-demo.yaml:
apiVersion: v1
kind: Pod
metadata:
name: resource-demo
spec:
containers:
- name: app
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
# Apply the pod
kubectl apply -f resource-demo.yaml
# Check resource allocation
kubectl describe node | grep -A 5 "Allocated resources"
# Check pod resources
kubectl describe pod resource-demo | grep -A 10 "Requests\|Limits"
Create memory-stress.yaml:
apiVersion: v1
kind: Pod
metadata:
name: memory-stress
spec:
containers:
- name: stress
image: polinux/stress
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]
resources:
requests:
memory: "50Mi"
limits:
memory: "100Mi" # Will be OOMKilled!
# Apply and watch it fail
kubectl apply -f memory-stress.yaml
# Watch pod status
kubectl get pod memory-stress --watch
# Check why it failed
kubectl describe pod memory-stress | grep -A 5 "Last State"
Expected: Pod will be OOMKilled (Out Of Memory) because it tries to use 150Mi but limit is 100Mi.
Create cpu-stress.yaml:
apiVersion: v1
kind: Pod
metadata:
name: cpu-stress
spec:
containers:
- name: stress
image: polinux/stress
command: ["stress"]
args: ["--cpu", "2"]
resources:
requests:
cpu: "100m"
limits:
cpu: "200m" # Will be throttled
# Apply the pod
kubectl apply -f cpu-stress.yaml
# Watch CPU usage (if metrics-server is installed)
kubectl top pod cpu-stress
# Check throttling (from node)
kubectl exec -it cpu-stress -- sh -c "cat /sys/fs/cgroup/cpu/cpu.stat"
Create namespace-with-limits.yaml:
apiVersion: v1
kind: Namespace
metadata:
name: limited-namespace
---
apiVersion: v1
kind: LimitRange
metadata:
name: resource-limits
namespace: limited-namespace
spec:
limits:
- default:
memory: "512Mi"
cpu: "500m"
defaultRequest:
memory: "256Mi"
cpu: "250m"
max:
memory: "1Gi"
cpu: "1000m"
min:
memory: "128Mi"
cpu: "100m"
type: Container
# Create namespace with limits
kubectl apply -f namespace-with-limits.yaml
# Check LimitRange
kubectl describe limitrange -n limited-namespace
# Create pod without specifying resources
kubectl run test-pod --image=nginx -n limited-namespace
# Check that defaults were applied
kubectl describe pod test-pod -n limited-namespace | grep -A 10 "Requests\|Limits"
Create resource-quota.yaml:
apiVersion: v1
kind: ResourceQuota
metadata:
name: namespace-quota
namespace: limited-namespace
spec:
hard:
requests.cpu: "2"
requests.memory: "4Gi"
limits.cpu: "4"
limits.memory: "8Gi"
pods: "10"
services: "5"
persistentvolumeclaims: "3"
# Apply quota
kubectl apply -f resource-quota.yaml
# Check quota
kubectl describe quota -n limited-namespace
# Try to exceed quota
kubectl create deployment nginx --image=nginx --replicas=15 -n limited-namespace
# Check why some pods didn't start
kubectl get events -n limited-namespace --sort-by='.lastTimestamp'
# Good: Requests = Limits (Guaranteed QoS)
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "256Mi"
cpu: "500m"
# Good: Requests < Limits (Burstable QoS)
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "1000m"
# Bad: No requests (BestEffort QoS, first to be evicted)
# No resources specified
# Check current nodes and their names
kubectl get nodes --show-labels
# Note: Node names differ between kind and minikube:
# - kind: workloads-lab-worker, workloads-lab-worker2, workloads-lab-worker3
# - minikube: minikube, minikube-m02, minikube-m03
# Replace <node-name> with your actual node names from 'kubectl get nodes'
# Get worker node names (skip control-plane node which is typically index 0)
# Store them in variables for reuse
WORKER_NODES=($(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'))
FIRST_WORKER=${WORKER_NODES[1]}
SECOND_WORKER=${WORKER_NODES[2]}
THIRD_WORKER=${WORKER_NODES[3]:-$SECOND_WORKER} # Use second worker if only 2 exist
# Label nodes by tier (replace with your actual node names)
kubectl label node $FIRST_WORKER tier=frontend
kubectl label node $SECOND_WORKER tier=backend
kubectl label node $THIRD_WORKER tier=database
# Label nodes by disk type
kubectl label node $SECOND_WORKER disk=ssd
kubectl label node $THIRD_WORKER disk=ssd
# Check labels
kubectl get nodes -L tier,disk
Create frontend-nodeSelector.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
nodeSelector:
tier: frontend
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "100m"
# Apply deployment
kubectl apply -f frontend-nodeSelector.yaml
# Check where pods landed
kubectl get pods -l app=frontend -o wide
# They should all be on the node with tier=frontend label
Create database-affinity.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: database
spec:
replicas: 2
selector:
matchLabels:
app: database
template:
metadata:
labels:
app: database
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: tier
operator: In
values:
- database
- key: disk
operator: In
values:
- ssd
containers:
- name: mariadb
image: mariadb:10.11
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
resources:
requests:
memory: "256Mi"
cpu: "250m"
# Apply deployment
kubectl apply -f database-affinity.yaml
# Check placement
kubectl get pods -l app=database -o wide
# Pods should only be on the node with tier=database AND disk=ssd labels
Create backend-preferred-affinity.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
spec:
replicas: 5
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: disk
operator: In
values:
- ssd
- weight: 20
preference:
matchExpressions:
- key: tier
operator: In
values:
- backend
containers:
- name: app
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "100m"
# Apply deployment
kubectl apply -f backend-preferred-affinity.yaml
# Check distribution
kubectl get pods -l app=backend -o wide
# Pods will prefer SSD nodes but can go elsewhere if needed
Create cache-with-affinity.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cache
spec:
replicas: 3
selector:
matchLabels:
app: cache
template:
metadata:
labels:
app: cache
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: kubernetes.io/hostname
containers:
- name: redis
image: redis:7
resources:
requests:
memory: "128Mi"
cpu: "100m"
# Apply deployment
kubectl apply -f cache-with-affinity.yaml
# Check co-location
kubectl get pods -l app=cache -o wide
kubectl get pods -l app=backend -o wide
# Cache pods should be on same nodes as backend pods
Create web-anti-affinity.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-spread
spec:
replicas: 3
selector:
matchLabels:
app: web-spread
template:
metadata:
labels:
app: web-spread
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-spread
topologyKey: kubernetes.io/hostname
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "100m"
# Apply deployment
kubectl apply -f web-anti-affinity.yaml
# Check spread
kubectl get pods -l app=web-spread -o wide
# Each pod should be on a different node
| Operator | Description |
|---|---|
In |
Label value in list |
NotIn |
Label value not in list |
Exists |
Label key exists |
DoesNotExist |
Label key doesn’t exist |
Gt |
Greater than (numeric) |
Lt |
Less than (numeric) |
| Effect | Description |
|---|---|
NoSchedule |
Don’t schedule new pods |
PreferNoSchedule |
Avoid scheduling if possible |
NoExecute |
Evict existing pods |
# Get your node names first
kubectl get nodes
# Taint worker nodes (replace with your actual node names)
# Get worker node names (skip control-plane node)
WORKER_NODES=($(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'))
FIRST_WORKER=${WORKER_NODES[1]}
SECOND_WORKER=${WORKER_NODES[2]}
THIRD_WORKER=${WORKER_NODES[3]:-$SECOND_WORKER} # Use second worker if only 2 exist
# Taint worker node for production only
kubectl taint nodes $FIRST_WORKER environment=production:NoSchedule
# Taint worker2 for databases only
kubectl taint nodes $SECOND_WORKER workload=database:NoSchedule
# Taint worker3 with NoExecute (will evict pods)
kubectl taint nodes $THIRD_WORKER maintenance=true:NoExecute
# Check taints
kubectl describe node $FIRST_WORKER | grep Taints
kubectl describe node $SECOND_WORKER | grep Taints
kubectl describe node $THIRD_WORKER | grep Taints
# Try to create a pod (it will likely remain Pending without tolerations)
kubectl run test-pod --image=nginx
# Check where it landed
kubectl get pod test-pod -o wide
# Describe events to see taint-related scheduling messages
kubectl describe pod test-pod | sed -n '/Events:/,$p'
# It won't schedule on tainted worker nodes!
Create production-app.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: production-app
spec:
replicas: 3
selector:
matchLabels:
app: production-app
template:
metadata:
labels:
app: production-app
spec:
tolerations:
- key: "environment"
operator: "Equal"
value: "production"
effect: "NoSchedule"
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "100m"
# Apply deployment
kubectl apply -f production-app.yaml
# Check placement
kubectl get pods -l app=production-app -o wide
# Pods can now schedule on the node with environment=production taint
Create database-toleration.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: database-app
spec:
replicas: 2
selector:
matchLabels:
app: database-app
template:
metadata:
labels:
app: database-app
spec:
tolerations:
- key: "workload"
operator: "Equal"
value: "database"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: workload
operator: In
values:
- database
containers:
- name: mariadb
image: mariadb:10.11
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
resources:
requests:
memory: "256Mi"
cpu: "250m"
# Apply deployment
kubectl apply -f database-toleration.yaml
# Check placement
kubectl get pods -l app=database-app -o wide
# Pods should be on the node with workload=database taint
Create tolerate-all.yaml:
apiVersion: v1
kind: Pod
metadata:
name: tolerate-everything
spec:
tolerations:
- operator: "Exists"
containers:
- name: nginx
image: nginx
# Apply pod
kubectl apply -f tolerate-all.yaml
# It can schedule on any node
kubectl get pod tolerate-everything -o wide
Create eviction-test.yaml:
apiVersion: v1
kind: Pod
metadata:
name: eviction-test
spec:
tolerations:
- key: "maintenance"
operator: "Equal"
value: "true"
effect: "NoExecute"
tolerationSeconds: 30 # Evict after 30 seconds
containers:
- name: nginx
image: nginx
# Get the node that has the maintenance taint (use the variable from Step 1)
NODE_WITH_MAINTENANCE=$THIRD_WORKER
# Remove previous NoExecute taint
kubectl taint nodes $NODE_WITH_MAINTENANCE maintenance=true:NoExecute-
# Apply pod on the node
kubectl apply -f eviction-test.yaml
# Add NoExecute taint again
kubectl taint nodes $NODE_WITH_MAINTENANCE maintenance=true:NoExecute
# Watch pod get evicted after 30 seconds
kubectl get pod eviction-test --watch
# Remove all taints (use the same variables from Step 1)
kubectl taint nodes $FIRST_WORKER environment=production:NoSchedule-
kubectl taint nodes $SECOND_WORKER workload=database:NoSchedule-
kubectl taint nodes $THIRD_WORKER maintenance=true:NoExecute-
# Verify
kubectl describe nodes | grep Taints
# Dedicated GPU node
kubectl taint nodes gpu-node hardware=gpu:NoSchedule
# Maintenance mode
kubectl taint nodes node1 maintenance=true:NoExecute
# Production isolation
kubectl taint nodes prod-node environment=production:NoSchedule
# Spot instances (might be evicted)
kubectl taint nodes spot-node instance-type=spot:PreferNoSchedule
Let’s combine everything we’ve learned into production-ready patterns.
Create ha-application.yaml:
# Frontend: Spread across nodes, prefer low-load nodes
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend-ha
spec:
replicas: 4
selector:
matchLabels:
app: frontend-ha
tier: frontend
template:
metadata:
labels:
app: frontend-ha
tier: frontend
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- frontend-ha
topologyKey: kubernetes.io/hostname
containers:
- name: nginx
image: nginx:1.21
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
---
# Backend: Co-locate with cache, spread across nodes
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-ha
spec:
replicas: 6
selector:
matchLabels:
app: backend-ha
tier: backend
template:
metadata:
labels:
app: backend-ha
tier: backend
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend-ha
topologyKey: kubernetes.io/hostname
containers:
- name: app
image: nginx:1.21
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "1000m"
---
# Cache: Co-locate with backend
apiVersion: apps/v1
kind: Deployment
metadata:
name: cache-ha
spec:
replicas: 3
selector:
matchLabels:
app: cache-ha
tier: cache
template:
metadata:
labels:
app: cache-ha
tier: cache
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: tier
operator: In
values:
- backend
topologyKey: kubernetes.io/hostname
containers:
- name: redis
image: redis:7
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
# Deploy the stack
kubectl apply -f ha-application.yaml
# Visualize the deployment
kubectl get pods -o wide | sort -k7
Create monitoring-complete.yaml:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
hostNetwork: true
hostPID: true
tolerations:
# Tolerate all taints
- operator: Exists
containers:
- name: node-exporter
image: prom/node-exporter:latest
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host/root
- --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
- --collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
ports:
- containerPort: 9100
hostPort: 9100
name: metrics
resources:
requests:
memory: "100Mi"
cpu: "100m"
limits:
memory: "200Mi"
cpu: "500m"
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
- name: root
mountPath: /host/root
readOnly: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
# Create namespace
kubectl create namespace monitoring
# Deploy monitoring
kubectl apply -f monitoring-complete.yaml
# Check deployment
kubectl get daemonsets -n monitoring
kubectl get pods -n monitoring -o wide
Create backup-cronjob-priority.yaml:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority-backup
value: 1000
globalDefault: false
description: "High priority for backup jobs"
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: database-backup-priority
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
priorityClassName: high-priority-backup
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disk
operator: In
values:
- ssd
containers:
- name: backup
image: mariadb:10.11
command:
- sh
- -c
- |
echo "High-priority backup starting"
sleep 10
echo "Backup complete"
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
restartPolicy: OnFailure
backoffLimit: 3
# Deploy with priority
kubectl apply -f backup-cronjob-priority.yaml
# Check priority class
kubectl get priorityclasses
# Trigger backup manually
kubectl create job test-backup --from=cronjob/database-backup-priority
# Check job
kubectl get jobs
kubectl describe job test-backup
Create pdb-example.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: critical-app
spec:
replicas: 5
selector:
matchLabels:
app: critical-app
template:
metadata:
labels:
app: critical-app
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "100m"
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: critical-app-pdb
spec:
minAvailable: 3 # Always keep at least 3 pods running
selector:
matchLabels:
app: critical-app
# Deploy with PDB
kubectl apply -f pdb-example.yaml
# Check PDB
kubectl get pdb
kubectl describe pdb critical-app-pdb
# Try to drain a node (will respect PDB)
# Replace <node-name> with your actual node name
# kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# Complete production-ready pod specification
apiVersion: v1
kind: Pod
metadata:
name: production-ready-pod
labels:
app: myapp
version: v1.0
environment: production
spec:
# Resource management
containers:
- name: app
image: myapp:1.0
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "1000m"
# Health checks
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
# Lifecycle hooks
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
# Scheduling
priorityClassName: high-priority-backup
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- myapp
topologyKey: kubernetes.io/hostname
tolerations:
- key: "production"
operator: "Equal"
value: "true"
effect: "NoSchedule"
# Security
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
# DNS config
dnsPolicy: ClusterFirst
# Restart policy
restartPolicy: Always
| Feature | ReplicaSet | DaemonSet | Job | CronJob | Deployment | StatefulSet |
|---|---|---|---|---|---|---|
| Purpose | Maintain N replicas | One per node | Run to completion | Scheduled jobs | Manage ReplicaSets | Ordered, stable apps |
| Replica Count | Fixed | Per node | As needed | Per schedule | Fixed | Fixed with order |
| Updates | Manual | Rolling | N/A | N/A | Rolling | Rolling (ordered) |
| Scaling | Manual | Automatic | N/A | N/A | Manual/Auto | Manual |
| Restart | Always | Always | Never/OnFailure | Never/OnFailure | Always | Always |
| Use Case | Base primitive | System services | Migrations | Backups | Apps | Databases |
| Mechanism | Scope | Complexity | Use Case |
|---|---|---|---|
| nodeSelector | Node | Simple | Basic node selection |
| Node Affinity | Node | Medium | Complex node rules |
| Pod Affinity | Pod | High | Co-locate pods |
| Pod Anti-Affinity | Pod | High | Spread pods |
| Taints | Node | Medium | Repel pods |
| Tolerations | Pod | Medium | Allow on tainted nodes |
# Check why pod is pending
kubectl describe pod <pod-name>
# Look for:
# - "0/N nodes are available: N node(s) didn't match node selector"
# - "0/N nodes are available: N node(s) didn't match pod affinity rules"
# Solutions:
# 1. Check node labels
kubectl get nodes --show-labels
# 2. Relax affinity rules (change required to preferred)
# 3. Add more nodes with required labels
# Check DaemonSet status
kubectl describe daemonset <daemonset-name>
# Common causes:
# - Node taints (DaemonSet needs tolerations)
# - Node selectors excluding nodes
# - Resource constraints
# Check node taints
kubectl describe nodes | grep -A 3 Taints
# Add tolerations to DaemonSet
# Check job status
kubectl describe job <job-name>
# Check pod logs
kubectl logs -l job-name=<job-name>
# Common causes:
# - Wrong restartPolicy (should be Never or OnFailure)
# - Container doesn't exit
# - backoffLimit reached
# Fix: Update job spec and recreate
# Check quota usage
kubectl describe quota -n <namespace>
# Check resource usage
kubectl top pods -n <namespace>
# Solutions:
# 1. Increase quota
# 2. Reduce resource requests
# 3. Delete unused pods
# Check scheduling decisions
kubectl get events --sort-by='.lastTimestamp' | grep <pod-name>
# Check node capacity
kubectl describe nodes | grep -A 5 "Allocated resources"
# Check pod scheduling details
kubectl get pod <pod-name> -o yaml | grep -A 20 "affinity\|nodeSelector\|tolerations"
# Force delete stuck pod
kubectl delete pod <pod-name> --grace-period=0 --force
# Check scheduler logs
kubectl logs -n kube-system -l component=kube-scheduler
# ReplicaSet
kubectl scale rs <name> --replicas=N
# DaemonSet
kubectl rollout status daemonset <name>
# Job
kubectl create job <name> --image=<image>
# CronJob
kubectl create cronjob <name> --schedule="* * * * *" --image=<image>
# Node labels
kubectl label node <node> key=value
# Taints
kubectl taint nodes <node> key=value:Effect
# Affinity debugging
kubectl describe pod <pod> | grep -A 20 "Node-Selectors\|Tolerations\|Events"
Practice Repository: Create your own examples and scenarios to solidify these concepts. The best way to learn Kubernetes is by breaking things and fixing them!