Select Page

Reducing Kubernetes Costs With Autoscaling

Ruchita Varma
Published: November 9, 2022

Kubernetes comes with three built-in autoscaling mechanisms. Explore and know how these mechanisms of autoscaling in Kubernetes can help to reduce your cloud bill and much more here in this article on Kubernetes and autoscaling!

Technically, containerization should be more cost-effective by default, but Kubernetes comes packed with expensive cost traps that may cause enterprises to spend over the fixed budget. Fortunately, there are a few tactics to keep cloud costs at bay, and autoscaling is one of them. Here you’ll read about autoscaling in Kubernetes in this article on Kubernetes and autoscaling.

Let’s Begin!

Kubernetes comes with three built-in autoscaling mechanisms to help reduce Kubernetes costs. The tighter they’re configured, the lesser the cost of running business applications. Keep on reading further to know how these mechanisms of autoscaling in Kubernetes can help to reduce your cloud bill.

Here, we’ll talk about these three autoscaling processes in Kubernetes, which include,

  • Horizontal Pod Autoscaling in Kubernetes
  • Vertical Pod Autoscaling in Kubernetes
  • Cluster Autoscaling in Kubernetes

1. Horizontal Pod Autoscaler (HPA)

What is a horizontal pod autoscaler? Let’s read about this in detail. The Horizontal Pod Autoscaler (HPA) scales the number of pods available in a Kubernetes cluster to handle the computational workload requirements of an application. As the demands of the application vary, you may want to add or remove pod replicas. This is where the Horizontal Pod Autoscaler (HPA) comes into the picture to scale these workloads for you automatically. It determines the number of pods needed based on metrics set by the user and applies the creation or deletion of pods based on threshold sets. In most cases, these metrics are CPU and RAM usage, but it is also possible to specify the custom metrics.

When to Use HPA?

After reading about what is horizontal pod autoscaler, let’s know when can you use it.HPA works best for scaling stateless applications but is also a good match for stateful applications. In order to get the highest cost savings for workloads where demand changes regularly, Kubernetes Horizontal Pod Autoscaling can be used along with cluster autoscaling. This helps in reducing the number of active nodes when the number of pods decreases.

How Does HPA Work?

This is how Kubernetes Horizontal Pod Autoscaling actually works. HPA observes pods and makes them capable of comprehending whether the number of pod replicas needs to change or not. In order to determine this, HPA takes the mean of a per-pod metric value and checks whether removing or adding replicas would bring that value closer to the target.

Best Practices for Using a Horizontal Pod Autoscaler

Here are some of the best practices for efficiently using Horizontal Pod Autoscaling in Kubernetes (HPA):

  • Configure Values for every Container: The scaling decisions made by HPA are based on the observed CPU utilization values of pods. This is calculated as a percentage of resource requests from individual pods. In case teams fail to include values for some containers, the calculations will be inaccurate and will lead to flawed scaling decisions. It’s essential to configure these values for every single container in every pod which works as a part of the Kubernetes controller.
  • Choose Custom Metrics over External Metrics when possible: To mitigate security threats and malicious attacks, teams prefer custom metrics over external metrics. The external metrics API can expose clusters to security risk because it can provide access to a large number of metrics. A custom metrics API imposes lesser risks if security is compromised because it only holds specific metrics.
  • Use HPA together with Cluster Autoscaler: Doing this enables teams to coordinate the scalability of pods with the behavior of nodes in the cluster. For instance, when there is a need to scale up, the Cluster Autoscaler can add eligible nodes, and when it’s scaling down, it can shut down unwanted nodes to conserve resources. 

2. Vertical Pod Autoscaler (VPA)

Vertical Pod autoscaling lets you analyze, monitor, and set CPU and memory resources required by the pods. The Vertical Pod Autoscaler (VPA) is a Kubernetes autoscaling procedure that increases and decreases the CPU and memory resource requests of pod containers to match the allocated cluster resource to the actual usage. The Vertical Pod Autoscaler replaces only the pods that are managed by a replication controller. That’s why VPA requires the Kubernetes metrics server to work.

When to Use the Vertical Pod Autoscaler? 

During the execution of the workloads, there might be a temporary need for high utilization. Increasing their request limits permanently would waste CPU or memory resources which limits the nodes that can run them. Spreading a workload across multiple instances of an application could be a difficult task to execute. This is where a Vertical Pod Autoscaler can assist.

How Does the Vertical Pod Autoscaler Work?

A VPA deployment includes three components: 

  1. Recommender: It monitors the current and past resource consumption and provides recommended CPU and memory request values for a container. 
  2. Updater:  It checks for pods with incorrect resources and deletes them so that the pods can be recreated with the new request values.
  3. Admission Plugin: It sets the correct resource requests on new pods i.e. the pods that are created or recreated by their controller due to changes made by the updater.

Best Practices for Using Vertical Pod Autoscaler

Consider these best practices for Vertical Pod Autoscaling.

  • Use it with the correct Kubernetes Version: Version 0.4 and later versions of the Vertical Pod Autoscaler need custom resource definition capabilities, so these versions of Vertical Pod Autoscalar can’t be used with Kubernetes versions that are older than the Kubernetes Version 1.11. In case you’re using an earlier Kubernetes version, it’s better to use version 0.3 of the VPA.

  • Run VPA with updateMode: “Off” at first: In order to configure VPA effectively and make full use of it, teams need to understand the resource usage of the pods that they want to autoscale. Configuring VPA with updateMode: “Off” will provide users with the recommended CPU and memory requests.

  • Understand your workload’s seasonality: For workloads that receive requests for constant high and low resource usage, VPA might not be the right for such a workload as it might get aggressive for the job because of replacing the pods over and over again. In such a scenario, HPA can be a better solution. It’s essential to understand the type of workload for choosing an appropriate autoscaler.

3. Cluster Autoscaler

A Cluster Autoscaler automatically resizes a cluster’s node pools based on the application workload demands. By automatically resizing a cluster’s node pools, teams can ensure application availability and optimize costs. A Cluster Autoscaler increases or decreases the size of a node pool automatically based on resource requests rather than on resource utilization of nodes in the node pool.

When to Use Cluster Autoscaler?

This autoscaling mechanism works well if you’re looking to optimize costs by dynamically scaling the number of nodes to match the current state of cluster utilization. It’s a great mechanism for workloads designed to scale rapidly and meet dynamic demands.

How Does Cluster Autoscaler Work?

The Cluster Autoscaler scans for non-scheduled pods and then calculates whether it’s possible to consolidate all of the pods deployed currently in order to run them on a small number of nodes. If Cluster Autoscaler identifies a node with pods that can be rescheduled to other nodes in the Kubernetes cluster, it evicts them and removes the spare node.

Cluster Autoscaler Best Practices

  • Make sure to use the Correct Version: When deploying a Cluster Autoscaler, use it with the recommended Kubernetes version.
  • Double-check cluster nodes for the same capacity: Check whether the cluster nodes have the same CPU and memory capacity. Otherwise, Cluster Autoscaler won’t work because it assumes that every node in the group has the same capacity.
  • Define resource requests for every pod: When using a cluster autoscaler, make sure that all the pods scheduled to run in a node for autoscaling have specified resource requests.

 Save and Manage Your Kubernetes Costs

After reading this article on Kubernetes and autoscaling, you must now have got a clear idea of why automating the scaling aspect of running Kubernetes is a smart move. Kubernetes Management Platforms can help teams in gaining a comprehensive view of the cluster resources. If you get complete visibility of the resource usage, you can easily scale up and scale down new nodes immediately to reduce waste.