Prometheus Metric Alerting Rule with Multiple Conditions

Alerts require fine-tuning and continuous optimisation to increase their accurateness, which can be achieved by adding more conditions to the alerting rule.

As an example, we want to be alerted if a Kubernetes Pod has been unhealthy for longer than 15 minutes. The alerting rule could by written like this:

min_over_time(sum by(namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0

We could add another condition to the rule to avoid a false positive, ensuring that the pod is at least 15 minutes old before triggering an alert.

To evaluate multiple Prometheus metrics in a single Alerting rule, and on or or on can be used.

and ON(pod) time() - kube_pod_created > 900

Using both conditions, our final rule looks like this:

min_over_time(sum by(namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0 and ON(pod) time() - kube_pod_created > 900

A great list of Prometheus alerting rules to start monitoring your services with can be found at Awesome Prometheus alerts.