On of the benefits of a Kubernetes cluster is easy scaling. In this post we configure some rules for Kubernetes autoscaling.
We start off with the example application from an earlier post. The complete autoscaling code can be found on github.
Autoscaling
Suppose we have an API in our cluster. If this API is working hard we want a new container to be started to help with the load. So we define a rule to specify this threshold.
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: java-spring-api
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: java-spring-api
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 50
The most interesting information is at the bottom: the threshold is a cpu use of more than 50%.
Of course we want to limit this creation of containers to signal if we are using a lot of resources. Especially on a test environment. We can do this in the overlays of our kustomize setup, in the overlays/test/autoscaling.yaml.
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: java-spring-api
spec:
minReplicas: 1
maxReplicas: 2
This configuration stops new container creation if there are 2 replicas.
Restart: autoscaling 0
What if you want Kubernetes to autoscale to 0 and then to 1 on certain conditions i.e. you want a container to restart. Then you can configure a livenessProbe. If this probe fails, kubelet will restart the container.
Spring Boot Actuator is an easy way to create a livenessProbe. We just add the actuator starter dependency.
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
And we have a health endpoint.

Now we can add the following configuration to the container section of the deployment.
livenessProbe:
httpGet:
path: /actuator/health
port: 8080
initialDelaySeconds: 60
periodSeconds: 5
failureThreshold: 6
Here we see the endpoint configured and some other thresholds that define when the container is marked unhealthy.
initialDelaySeconds
: Number of seconds after the container has started before liveness or readiness probes are initiated.periodSeconds
: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.failureThreshold
: When a Pod starts and the probe fails, kubelet will tryfailureThreshold
times before giving up. Defaults to 3. Minimum value is 1.
If you’d like to start sending traffic to a Pod only when a probe succeeds, specify a readiness probe in the same way we defined the liveness probe. This could be the same probe but is more in its place if the container does heavy loading of data on startup.
Hopefully you can now configure your own rules for Kubernetes autoscaling. Happy scaling!