Prometheus Metric Alerting Rule with Multiple Conditions

Mar 27, 2022 · 1 min read

Alerts require fine-tuning and continuous optimisation to increase their accurateness, which can be achieved by adding more conditions to the alerting rule.

As an example, we want to be alerted if a Kubernetes Pod has been unhealthy for longer than 15 minutes. The alerting rule could by written like this:

min_over_time(sum by(namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0

We could add another condition to the rule to avoid a false positive, ensuring that the pod is at least 15 minutes old before triggering an alert.

To evaluate multiple Prometheus metrics in a single Alerting rule, and on or or on can be used.

and ON(pod) time() - kube_pod_created > 900

Using both conditions, our final rule looks like this:

min_over_time(sum by(namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0 and ON(pod) time() - kube_pod_created > 900

A great list of Prometheus alerting rules to start monitoring your services with can be found at Awesome Prometheus alerts.

Covering the cloud computing environment

Contact

Prometheus Metric Alerting Rule with Multiple Conditions