In an earlier post we saw how to setup a Prometheus instance in our cluster. Here we create alerts on the Prometheus monitoring for when some action is needed on our pods. No need for constant monitoring with Prometheus alerting.

For in depth documentation on Prometheus go here. More on alert managing with CoreOs is here.

Alert manager

Here is an overview of the Prometheus architecture in a diagram.

Prometheus architecture diagram
Prometheus Operator Architecture. Source: www.nicktriller.com

We already saw multiple Custom Resourse Definitions (CRDs) that the Prometheus Operator gives us. The Alertmanager is also a CRD. The definition of this instance is very simple. The complete code for this setup is found on this branch on github.

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: sybrenbolandit
spec:
  replicas: 1

We also need to tell the Prometheus pod what alert manager to use:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  replicas: 1
  alerting:
    alertmanagers:
      - name: alertmanager-sybrenbolandit
        namespace: sybrenbolandit
        port: web

But the most interesting is configuration of the Receiver, the place where the alerts are sent. Here we configure alerts in a slack channel. We need the api_url of our slack which you can find for yourself by following this. Here is the config file:

apiVersion: v1
kind: Secret
metadata:
  name: "alertmanager-sybrenbolandit"
stringData:
  alertmanager.yaml: |
    global:
      resolve_timeout: 1m
    route:
      group_by: ['env', 'job', 'alertname']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 1m
      receiver: 'slack'
    receivers:
    - name: slack
      slack_configs:
      - api_url: ##Your slack api url##
        icon_url: https://avatars3.githubusercontent.com/u/3380462
        send_resolved: true
        channel: '##Your slack channel name##'
        title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.prometheus }} {{ .GroupLabels.job }}'
        text: "<!channel> \nsummary: {{ .CommonAnnotations.summary }}\ndocumentation: {{ .CommonAnnotations.documentation }}\n"

Note that we can configure a lot more but this will be out of scope for now.

Alert – Application down!

Now we have the architecture in place. We only need to add a Prometheus Rule for our application that triggers an alert. We build from the Spring Boot application that we deployed in the cluster in an earlier post. The configuration of this rule is done as another CRD and is found on this branch on github.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: null
  name: java-spring-api-rules
  labels:
    prometheus: sybrenbolandit
    role: alert-rules
spec:
  groups:
  - name: java-spring-api
    rules:
      - alert: NoHealthyHosts
        expr: sum(up{job="java-spring-api"}) < 1 or absent(up{job="java-spring-api"})
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "No healthy hosts - java-spring-api"

Note that the important part is under expr. Here we define when the alert is fired. Here we say that we want to be warned when there are no healthy hosts of our application.

When we now scale our application down to 0 replicas… (to force an alert) And scale up again some time later:

Slack alerts example

Hopefully you can now start with your own Prometheus Alerting. Happy alerting!