Understanding Horizontal Pod Autoscaler (HPA) in Kubernetes: A Simple Guide

Kubernetes

Share Post Now :

HOW TO GET HIGH PAYING JOBS IN AWS CLOUD

Even as a beginner with NO Experience Coding Language

Explore Free course Now

Table of Contents

Loading

Kubernetes, an open-source container orchestration platform, allows you to automate the deployment, scaling, and management of containerized applications. One of its powerful features is the Horizontal Pod Autoscaler (HPA), which automatically scales the number of pods in a deployment or replication controller based on observed CPU utilization (or other select metrics).

In this blog, we will delve into:

What is Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler (HPA) in Kubernetes automatically adjusts the number of pod replicas in a deployment, ReplicaSet, or StatefulSet based on observed metrics such as CPU or memory usage. This ensures that your applications can handle varying levels of load by scaling out (adding more pods) or scaling in (removing pods) as needed.

Why Use HPA?

  1. Improved Performance: HPA ensures that your applications have enough resources to handle the incoming traffic by dynamically adjusting the number of pods.
  2. Cost Efficiency: By scaling down the number of pods during low demand, HPA helps in reducing resource usage and costs.
  3. Resilience and Reliability: Automatic scaling can help maintain application availability and performance during unexpected spikes in demand.

How Does HPA Work?

HPA periodically checks the metrics server for the current resource usage of pods. Based on the specified target resource utilization, it calculates the desired number of pods and adjusts the deployment accordingly. Here’s a simplified flow of how HPA works:

  1. Metrics Collection: HPA queries the metrics server (like Prometheus or Kubernetes Metrics Server) to gather current CPU/memory usage.
  2. Comparison: It compares the current metrics against the target thresholds defined in the HPA configuration.
  3. Scaling Decision: If the usage is above or below the target, HPA calculates the required number of replicas and updates the deployment accordingly.

Setting Up Horizontal Pod Autoscaler in Kubernetes

Prerequisites

  • A running Kubernetes cluster.
  • kubectl command-line tool configured to communicate with your cluster.
  • Metrics Server installed in the cluster (or an equivalent metrics provider like Prometheus).

Note: Check out our guide to quickly install a three-node Kubernetes cluster.

Step 1: Install Metrics Server

The Metrics Server collects resource metrics from Kubelet and exposes them via the Kubernetes API. If it’s not already installed, follow these steps:

  1. Download the Metrics Server components:
    wget https://raw.githubusercontent.com/k21academyuk/Kubernetes/master/metrics-server.yaml
    kubectl create -f metrics-server.yaml

  2. Verify the Metrics Server installation:
    kubectl get pods -n kube-system
    

Step 2: Deploy a Sample Application

We’ll deploy a simple application to demonstrate HPA. For this example, we’ll use a basic Nginx deployment.

  1. Create a deployment file (nginx-deployment.yaml):

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment # Name of the deployment
    spec:
      replicas: 1 # Number of initial replicas (pods)
      selector:
        matchLabels:
          app: nginx # Label selector for the pods
      template:
        metadata:
          labels:
            app: nginx # Labels for the pods
        spec:
          containers:
          - name: nginx # Name of the container
            image: nginx:latest # Docker image to use for this container
            ports:
            - containerPort: 80 # Port to expose on the container
            resources:
              requests:
                cpu: "100m" # CPU request for the container
              limits:
                cpu: "200m" # CPU limit for the container
    
  2. Apply the deployment:
    kubectl apply -f nginx-deployment.yaml

  3. Verify the deployment:
    kubectl get deployments

To know more about Kubernetes deployment, click here.

Step 3: Create a Horizontal Pod Autoscaler

Now that we have a running application, let’s create an HPA that scales the Nginx deployment based on CPU utilization.

  1. Create the HPA resource:

    kubectl autoscale deployment nginx-deployment --cpu-percent=50 --min=1 --max=10
    


    This command sets up an HPA for the nginx-deployment to maintain an average CPU utilization of 50%. The number of pods will scale between 1 and 10 based on the CPU load.

  2. Verify the HPA creation:
    kubectl get hpa
    

Step 4: Simulate Load and Observe Autoscaling

To see the HPA in action, we’ll simulate CPU load on the Nginx pods.

  1. Deploy a CPU load generator (busybox):
  2. Open a separate SSH terminal and execute the following commands to create a new pod:
    kubectl run -it --rm load-generator --image=busybox /bin/sh
  3. Execute an infinite loop within the pod to load the nginx-deployment pod
    while true; do wget -q -O- http://10.40.0.2; done
    

  4. Observe the HPA scaling:
    kubectl get hpa
    kubectl get pods
    

  5. You should see the number of Nginx pods increase as the CPU load goes up.

Best Practices for Using Horizontal Pod Autoscaler

  • Monitor Your Metrics: Regularly monitor the metrics and logs to ensure the HPA is working as expected.
  • Set Realistic Targets: Define realistic and achievable target utilization values to avoid frequent scaling actions.
  • Combine with Cluster Autoscaler: For large-scale applications, use HPA in conjunction with Cluster Autoscaler to dynamically adjust the number of nodes in the cluster.

Conclusion

The Horizontal Pod Autoscaler (HPA) is an essential feature in Kubernetes that helps maintain application performance and cost efficiency by automatically scaling the number of pod replicas based on resource utilization. By understanding and configuring HPA properly, you can ensure that your applications are resilient, scalable, and ready to handle varying loads.

Frequently Answered Questions

What is HPA in Kubernetes?

The Horizontal Pod Autoscaler (HPA) is a Kubernetes resource that automatically adjusts the number of pod replicas in a deployment, replication controller, or stateful set based on observed CPU utilization (or other select metrics).

How does HPA work?

HPA monitors the metrics of your pods, such as CPU utilization or custom metrics, and scales the number of pod replicas up or down to maintain the desired performance. It uses the Metrics Server to gather these metrics.

What are the benefits of using HPA?

HPA helps in automatically adjusting the number of pods based on the load, improving resource utilization, maintaining application performance, and reducing costs by scaling down when the demand is low.

How do I enable HPA in my Kubernetes cluster?

To enable HPA, you need to have the Metrics Server installed and running in your cluster. Then, you can define an HPA resource using a YAML file or kubectl autoscale command specifying the target deployment, desired metrics, and scaling policies.

How can I monitor HPA activity?

You can monitor HPA activity using kubectl get hpa command to see the current status, metrics, and scaling decisions. Additionally, you can check the Kubernetes dashboard or use monitoring tools like Prometheus and Grafana for more detailed insights.

Related Post

Join FREE Masterclass of Kubernetes

Discover the Power of Kubernetes, Docker & DevOps – Join Our Free Masterclass. Unlock the secrets of Kubernetes, Docker, and DevOps in our exclusive, no-cost masterclass. Take the first step towards building highly sought-after skills and securing lucrative job opportunities. Click on the below image to Register Our FREE Masterclass Now!

Mastering kubernetes content upgrade

Picture of mike

mike

I started my IT career in 2000 as an Oracle DBA/Apps DBA. The first few years were tough (<$100/month), with very little growth. In 2004, I moved to the UK. After working really hard, I landed a job that paid me £2700 per month. In February 2005, I saw a job that was £450 per day, which was nearly 4 times of my then salary.