![]()
Kubernetes, an open-source container orchestration platform, allows you to automate the deployment, scaling, and management of containerized applications. One of its powerful features is the Horizontal Pod Autoscaler (HPA), which automatically scales the number of pods in a deployment or replication controller based on observed CPU utilization (or other select metrics).
In this blog, we will delve into:
- What is Horizontal Pod Autoscaler?
- Why Use HPA?
- How Does HPA Work?
- Setting Up Horizontal Pod Autoscaler in Kubernetes
- Best Practices for Using Horizontal Pod Autoscaler
- Conclusion
What is Horizontal Pod Autoscaler?
The Horizontal Pod Autoscaler (HPA) in Kubernetes automatically adjusts the number of pod replicas in a deployment, ReplicaSet, or StatefulSet based on observed metrics such as CPU or memory usage. This ensures that your applications can handle varying levels of load by scaling out (adding more pods) or scaling in (removing pods) as needed.
Why Use HPA?
- Improved Performance: HPA ensures that your applications have enough resources to handle the incoming traffic by dynamically adjusting the number of pods.
- Cost Efficiency: By scaling down the number of pods during low demand, HPA helps in reducing resource usage and costs.
- Resilience and Reliability: Automatic scaling can help maintain application availability and performance during unexpected spikes in demand.
How Does HPA Work?
HPA periodically checks the metrics server for the current resource usage of pods. Based on the specified target resource utilization, it calculates the desired number of pods and adjusts the deployment accordingly. Here’s a simplified flow of how HPA works:
- Metrics Collection: HPA queries the metrics server (like Prometheus or Kubernetes Metrics Server) to gather current CPU/memory usage.
- Comparison: It compares the current metrics against the target thresholds defined in the HPA configuration.
- Scaling Decision: If the usage is above or below the target, HPA calculates the required number of replicas and updates the deployment accordingly.
Setting Up Horizontal Pod Autoscaler in Kubernetes
Prerequisites
- A running Kubernetes cluster.
kubectlcommand-line tool configured to communicate with your cluster.- Metrics Server installed in the cluster (or an equivalent metrics provider like Prometheus).
Step 1: Install Metrics Server
The Metrics Server collects resource metrics from Kubelet and exposes them via the Kubernetes API. If it’s not already installed, follow these steps:
- Download the Metrics Server components:
wget https://raw.githubusercontent.com/k21academyuk/Kubernetes/master/metrics-server.yaml kubectl create -f metrics-server.yaml

- Verify the Metrics Server installation:
kubectl get pods -n kube-system

Step 2: Deploy a Sample Application
We’ll deploy a simple application to demonstrate HPA. For this example, we’ll use a basic Nginx deployment.
- Create a deployment file (nginx-deployment.yaml):
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment # Name of the deployment spec: replicas: 1 # Number of initial replicas (pods) selector: matchLabels: app: nginx # Label selector for the pods template: metadata: labels: app: nginx # Labels for the pods spec: containers: - name: nginx # Name of the container image: nginx:latest # Docker image to use for this container ports: - containerPort: 80 # Port to expose on the container resources: requests: cpu: "100m" # CPU request for the container limits: cpu: "200m" # CPU limit for the container - Apply the deployment:
kubectl apply -f nginx-deployment.yaml

- Verify the deployment:
kubectl get deployments

To know more about Kubernetes deployment, click here.
Step 3: Create a Horizontal Pod Autoscaler
Now that we have a running application, let’s create an HPA that scales the Nginx deployment based on CPU utilization.
- Create the HPA resource:
kubectl autoscale deployment nginx-deployment --cpu-percent=50 --min=1 --max=10

This command sets up an HPA for thenginx-deploymentto maintain an average CPU utilization of 50%. The number of pods will scale between 1 and 10 based on the CPU load. - Verify the HPA creation:
kubectl get hpa

Step 4: Simulate Load and Observe Autoscaling
To see the HPA in action, we’ll simulate CPU load on the Nginx pods.
- Deploy a CPU load generator (busybox):
- Open a separate SSH terminal and execute the following commands to create a new pod:
kubectl run -it --rm load-generator --image=busybox /bin/sh
- Execute an infinite loop within the pod to load the
nginx-deploymentpodwhile true; do wget -q -O- http://10.40.0.2; done

- Observe the HPA scaling:
kubectl get hpa kubectl get pods

- You should see the number of Nginx pods increase as the CPU load goes up.
Best Practices for Using Horizontal Pod Autoscaler
- Monitor Your Metrics: Regularly monitor the metrics and logs to ensure the HPA is working as expected.
- Set Realistic Targets: Define realistic and achievable target utilization values to avoid frequent scaling actions.
- Combine with Cluster Autoscaler: For large-scale applications, use HPA in conjunction with Cluster Autoscaler to dynamically adjust the number of nodes in the cluster.
Conclusion
The Horizontal Pod Autoscaler (HPA) is an essential feature in Kubernetes that helps maintain application performance and cost efficiency by automatically scaling the number of pod replicas based on resource utilization. By understanding and configuring HPA properly, you can ensure that your applications are resilient, scalable, and ready to handle varying loads.
Frequently Answered Questions
What is HPA in Kubernetes?
The Horizontal Pod Autoscaler (HPA) is a Kubernetes resource that automatically adjusts the number of pod replicas in a deployment, replication controller, or stateful set based on observed CPU utilization (or other select metrics).
How does HPA work?
HPA monitors the metrics of your pods, such as CPU utilization or custom metrics, and scales the number of pod replicas up or down to maintain the desired performance. It uses the Metrics Server to gather these metrics.
What are the benefits of using HPA?
HPA helps in automatically adjusting the number of pods based on the load, improving resource utilization, maintaining application performance, and reducing costs by scaling down when the demand is low.
How do I enable HPA in my Kubernetes cluster?
To enable HPA, you need to have the Metrics Server installed and running in your cluster. Then, you can define an HPA resource using a YAML file or kubectl autoscale command specifying the target deployment, desired metrics, and scaling policies.
How can I monitor HPA activity?
You can monitor HPA activity using kubectl get hpa command to see the current status, metrics, and scaling decisions. Additionally, you can check the Kubernetes dashboard or use monitoring tools like Prometheus and Grafana for more detailed insights.
Related Post
- Subscribe to our YouTube channel on “Docker & Kubernetes”
- Docker Architecture: A Complete Docker Introduction
- Docker Compose Overview & Steps to Install Docker Compose
- Kubernetes for Beginners
- Kubernetes Architecture | An Introduction to Kubernetes Components
- What is Kubernetes Cluster? Components, Benefits & Working
Join FREE Masterclass of Kubernetes
Discover the Power of Kubernetes, Docker & DevOps – Join Our Free Masterclass. Unlock the secrets of Kubernetes, Docker, and DevOps in our exclusive, no-cost masterclass. Take the first step towards building highly sought-after skills and securing lucrative job opportunities. Click on the below image to Register Our FREE Masterclass Now!

