Horizontal Pod Autoscaler (HPA): Scaling Made Simple in K8s

Kubernetes, an open-source container orchestration platform, allows you to automate the deployment, scaling, and management of containerized applications. One of its powerful features is the Horizontal Pod Autoscaler (HPA), which automatically scales the number of pods in a deployment or replication controller based on observed CPU utilization (or other select metrics).

In this blog, we will delve into:

What is Horizontal Pod Autoscaler?
Why Use HPA?
How Does HPA Work?
Setting Up Horizontal Pod Autoscaler in Kubernetes
Best Practices for Using Horizontal Pod Autoscaler
Conclusion

What is Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler (HPA) in Kubernetes automatically adjusts the number of pod replicas in a deployment, ReplicaSet, or StatefulSet based on observed metrics such as CPU or memory usage. This ensures that your applications can handle varying levels of load by scaling out (adding more pods) or scaling in (removing pods) as needed.

Why Use HPA?

Improved Performance: HPA ensures that your applications have enough resources to handle the incoming traffic by dynamically adjusting the number of pods.
Cost Efficiency: By scaling down the number of pods during low demand, HPA helps in reducing resource usage and costs.
Resilience and Reliability: Automatic scaling can help maintain application availability and performance during unexpected spikes in demand.

How Does HPA Work?

HPA periodically checks the metrics server for the current resource usage of pods. Based on the specified target resource utilization, it calculates the desired number of pods and adjusts the deployment accordingly. Here’s a simplified flow of how HPA works:

Metrics Collection: HPA queries the metrics server (like Prometheus or Kubernetes Metrics Server) to gather current CPU/memory usage.
Comparison: It compares the current metrics against the target thresholds defined in the HPA configuration.
Scaling Decision: If the usage is above or below the target, HPA calculates the required number of replicas and updates the deployment accordingly.

Setting Up Horizontal Pod Autoscaler in Kubernetes

Prerequisites

A running Kubernetes cluster.
kubectl command-line tool configured to communicate with your cluster.
Metrics Server installed in the cluster (or an equivalent metrics provider like Prometheus).

Note: Check out our guide to quickly install a three-node Kubernetes cluster.

Step 1: Install Metrics Server

The Metrics Server collects resource metrics from Kubelet and exposes them via the Kubernetes API. If it’s not already installed, follow these steps:

Download the Metrics Server components:

wget https://raw.githubusercontent.com/k21academyuk/Kubernetes/master/metrics-server.yaml
kubectl create -f metrics-server.yaml

Verify the Metrics Server installation:
```
kubectl get pods -n kube-system
```

Step 2: Deploy a Sample Application

We’ll deploy a simple application to demonstrate HPA. For this example, we’ll use a basic Nginx deployment.

Create a deployment file (nginx-deployment.yaml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment # Name of the deployment
spec:
  replicas: 1 # Number of initial replicas (pods)
  selector:
    matchLabels:
      app: nginx # Label selector for the pods
  template:
    metadata:
      labels:
        app: nginx # Labels for the pods
    spec:
      containers:
      - name: nginx # Name of the container
        image: nginx:latest # Docker image to use for this container
        ports:
        - containerPort: 80 # Port to expose on the container
        resources:
          requests:
            cpu: "100m" # CPU request for the container
          limits:
            cpu: "200m" # CPU limit for the container

Apply the deployment:
```
kubectl apply -f nginx-deployment.yaml
```
Verify the deployment:
```
kubectl get deployments
```

To know more about Kubernetes deployment, click here.

Step 3: Create a Horizontal Pod Autoscaler

Now that we have a running application, let’s create an HPA that scales the Nginx deployment based on CPU utilization.

Create the HPA resource:
```
kubectl autoscale deployment nginx-deployment --cpu-percent=50 --min=1 --max=10
```
This command sets up an HPA for the nginx-deployment to maintain an average CPU utilization of 50%. The number of pods will scale between 1 and 10 based on the CPU load.
Verify the HPA creation:
```
kubectl get hpa
```

Step 4: Simulate Load and Observe Autoscaling

To see the HPA in action, we’ll simulate CPU load on the Nginx pods.

Deploy a CPU load generator (busybox):
Open a separate SSH terminal and execute the following commands to create a new pod:
```
kubectl run -it --rm load-generator --image=busybox /bin/sh
```
Execute an infinite loop within the pod to load the nginx-deployment pod
```
while true; do wget -q -O- http://10.40.0.2; done
```
Observe the HPA scaling:
```
kubectl get hpa
kubectl get pods
```
You should see the number of Nginx pods increase as the CPU load goes up.

Best Practices for Using Horizontal Pod Autoscaler

Monitor Your Metrics: Regularly monitor the metrics and logs to ensure the HPA is working as expected.
Set Realistic Targets: Define realistic and achievable target utilization values to avoid frequent scaling actions.
Combine with Cluster Autoscaler: For large-scale applications, use HPA in conjunction with Cluster Autoscaler to dynamically adjust the number of nodes in the cluster.

Conclusion

The Horizontal Pod Autoscaler (HPA) is an essential feature in Kubernetes that helps maintain application performance and cost efficiency by automatically scaling the number of pod replicas based on resource utilization. By understanding and configuring HPA properly, you can ensure that your applications are resilient, scalable, and ready to handle varying loads.

Frequently Answered Questions

What is HPA in Kubernetes?

The Horizontal Pod Autoscaler (HPA) is a Kubernetes resource that automatically adjusts the number of pod replicas in a deployment, replication controller, or stateful set based on observed CPU utilization (or other select metrics).

How does HPA work?

HPA monitors the metrics of your pods, such as CPU utilization or custom metrics, and scales the number of pod replicas up or down to maintain the desired performance. It uses the Metrics Server to gather these metrics.

What are the benefits of using HPA?

HPA helps in automatically adjusting the number of pods based on the load, improving resource utilization, maintaining application performance, and reducing costs by scaling down when the demand is low.

How do I enable HPA in my Kubernetes cluster?

To enable HPA, you need to have the Metrics Server installed and running in your cluster. Then, you can define an HPA resource using a YAML file or kubectl autoscale command specifying the target deployment, desired metrics, and scaling policies.

How can I monitor HPA activity?

You can monitor HPA activity using kubectl get hpa command to see the current status, metrics, and scaling decisions. Additionally, you can check the Kubernetes dashboard or use monitoring tools like Prometheus and Grafana for more detailed insights.

Join FREE Masterclass of Kubernetes

Discover the Power of Kubernetes, Docker & DevOps – Join Our Free Masterclass. Unlock the secrets of Kubernetes, Docker, and DevOps in our exclusive, no-cost masterclass. Take the first step towards building highly sought-after skills and securing lucrative job opportunities. Click on the below image to Register Our FREE Masterclass Now!

All Course

Featured Course

All Webinars

Featured Webinars

All Guides

Featured Guides

Understanding Horizontal Pod Autoscaler (HPA) in Kubernetes: A Simple Guide

Share Post Now :

HOW TO GET HIGH PAYING JOBS IN AWS CLOUD

What is Horizontal Pod Autoscaler?

Why Use HPA?

How Does HPA Work?

Setting Up Horizontal Pod Autoscaler in Kubernetes

Prerequisites

Step 1: Install Metrics Server

Step 2: Deploy a Sample Application

Step 3: Create a Horizontal Pod Autoscaler

Step 4: Simulate Load and Observe Autoscaling

Best Practices for Using Horizontal Pod Autoscaler

Conclusion

Frequently Answered Questions

What is HPA in Kubernetes?

How does HPA work?

What are the benefits of using HPA?

How do I enable HPA in my Kubernetes cluster?

How can I monitor HPA activity?

Related Post

Join FREE Masterclass of Kubernetes

Atul Kumar

Recent Posts

Microsoft Agentic AI Business Solutions Architect [AB-100] | K21 Academy

Interview Introduction: How to Introduce yourself in a Job Interview | K21Academy

CrewAI | K21 Academy

Most Popluar Posts

AWS Salary in India 2026: Freshers and Experienced

Top AWS & Azure Cloud Projects in 2026 | K21 Academy

AWS Cloud Job Oriented Program: Step-by-Step Hands-on Labs & Projects

Categories

All Courses

Pages