Horizontal Pod Autoscaler (HPA)

So, I’m a Kubernetes administrator

and I’m sitting on my machine looking at a cluster,

and I’m tasked to make sure

that there is always sufficient workload

to support demand for this application.

From a deployment configuration perspective, I see this pod requests 250 milli CPU

and has a limit of 500 millicores of CPU.

This means that 500 millicores is the max the CPU it gets

after which it doesn’t get anymore.

And the capacity that a single pod can handle

is 500 millicores of CPU.

So, I would run the kubectl top pod command

and monitor the resource consumption of the pod

if it had to do it manually. Now remember that you must have the metrics server

running on the cluster to be able to monitor

the resource usage like this.

Now, when it reaches the threshold of 450 millicores

or whatever it is that I’ve defined as the threshold, or close to that, I would run the kubectl scale command

to scale the deployment to add additional pods.

So, that’s the manual way to scale a workload.

The problems with this approach

is that I have to sit in front of my computer

and continuously monitor resource usage.

I need to manually run commands to scale up and down.

And if there’s a sudden traffic spike

and I wanna break or something,

I may not be able to react fast enough

to support the spike in the application or in the traffic.

So to solve this, we use the Horizontal Pod Autoscaler.

So, the Horizontal Pod Autoscaler continuously monitors

the metrics as we did manually using the top command.

It then automatically increases or decreases

the number of pods in a deployment stateful set

or replica set based on the CPU memory, or custom metrics.

And if the CPU memory or memory usage goes too high,

HPA creates more pods to handle that.

And if it drops,

it removes the extra pods to save resources.

And this balances the thresholds.

And note that it can also track

multiple different types of metrics,

which we’ll refer to in a few minutes.

So, let’s see this in action.

For the given NGINX deployment,

we can configure a Horizontal Pod Autoscaler

by running the kubectl autoscale command targeting the deployment my-app,

and specifying a CPU threshold of 50%

with a minimum of 1 and maximum of 10 pods.

So when this command is run,

Kubernetes creates a Horizontal Pod Autoscaler

for this deployment that first reads the limits

configured on the pod,

and then learns that it’s set to 500 millicore.

It then continuously pulls the metric server

to monitor the usage, and when the usage goes beyond 50%,

it modifies the number of replicas to scale up or down

depending on the usage. So, to view the status of the created HPA,

run the kubectl get hpa command,

and it lists the current HPA.

The targets column shows the current CPU usage

versus the threshold we have set,

and the minimum and maximum,

and the current count of replicas.

So, it would never go beyond the maximum

that we have specified when scaling up

and it would not go beyond the minimum

that we have specified when scaling down.

And when you no longer need the HPA, you can delete it using the kubectl delete hpa command.

Now, that was the imperative approach to creating an HPA.

There’s also a declarative approach. So, create an HPA definition file with the API version

set to autoscaling/v2.

Kind is set to HorizontalPodAutoscaler.

The name is set to my-app-hpa.

And then we have the scaleTargetRef.

This is the target resource we want the HPA to monitor.

That’s the deployment named my-app.

We also have the min and maxReplicas defined.

And then we have configured the metrics

and resources to monitor.

In this case, the resource being CPU

and target utilization being 50%. Now, note that HPA comes built in

with Kubernetes since version 1.23.

So, there is no separate installation procedures required.

Note that it relies on metrics server,

so that is a prerequisite. So, talking about metrics server,

we spoke about the metrics server that HPA depends on

to get current resource utilization numbers.

Now what we have been referring to

is the internal metrics server,

but there are also other resources that we can refer to,

such as a Custom Metrics Adapter

that can retrieve information from other internal sources

like a workload deployed in a cluster.

However, these are still internal sources. We can also refer to external sources

such as tools or other instances

that are outside of the Kubernetes cluster,

such as a Datadog or Dynatrace instance

using an external adapter.

However, these are beyond the scope of this course.

So, more details and labs about these are available

in our Kubernetes Autoscaling course.

To keep the scope just enough for the exam,

this is all that we’ll discuss about HPA for now.

Well, thank you so much for watching.

Head over to the labs and I’ll see you in the next one.