On of the benefits of a Kubernetes cluster is easy scaling. In this post we configure some rules for Kubernetes autoscaling.

We start off with the example application from an earlier post. The complete autoscaling code can be found on github.


Suppose we have an API in our cluster. If this API is working hard we want a new container to be started to help with the load. So we define a rule to specify this threshold.

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
  name: java-spring-api
    apiVersion: apps/v1
    kind: Deployment
    name: java-spring-api
  - type: Resource
      name: cpu
      targetAverageUtilization: 50

The most interesting information is at the bottom: the threshold is a cpu use of more than 50%.

Of course we want to limit this creation of containers to signal if we are using a lot of resources. Especially on a test environment. We can do this in the overlays of our kustomize setup, in the overlays/test/autoscaling.yaml.

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
  name: java-spring-api
  minReplicas: 1
  maxReplicas: 2

This configuration stops new container creation if there are 2 replicas.

Restart: autoscaling 0

What if you want Kubernetes to autoscale to 0 and then to 1 on certain conditions i.e. you want a container to restart. Then you can configure a livenessProbe. If this probe fails, kubelet will restart the container.

Spring Boot Actuator is an easy way to create a livenessProbe. We just add the actuator starter dependency.


And we have a health endpoint.

actuator health endpoint output - KUBERNETES AUTOSCALING

Now we can add the following configuration to the container section of the deployment.

    path: /actuator/health
    port: 8080
  initialDelaySeconds: 60
  periodSeconds: 5
  failureThreshold: 6

Here we see the endpoint configured and some other thresholds that define when the container is marked unhealthy.

  • initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated.
  • periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
  • failureThreshold: When a Pod starts and the probe fails, kubelet will try failureThreshold times before giving up. Defaults to 3. Minimum value is 1.

If you’d like to start sending traffic to a Pod only when a probe succeeds, specify a readiness probe in the same way we defined the liveness probe. This could be the same probe but is more in its place if the container does heavy loading of data on startup.

Hopefully you can now configure your own rules for Kubernetes autoscaling. Happy scaling!