Prometheus Metric Alerting Rule with Multiple Conditions
ยท 1 min read
Alerts require fine-tuning and continuous optimisation to increase their accurateness, which can be achieved by adding more conditions to the alerting rule.
As an example, we want to be alerted if a Kubernetes Pod has been unhealthy for longer than 15 minutes. The alerting rule could by written like this:
min_over_time(sum by(namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0
We could add another condition to the rule to avoid a false positive, ensuring that the pod is at least 15 minutes old before triggering an alert.
To evaluate multiple Prometheus metrics in a single Alerting rule, and on
or or on
can be used.
and ON(pod) time() - kube_pod_created > 900
Using both conditions, our final rule looks like this:
min_over_time(sum by(namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0 and ON(pod) time() - kube_pod_created > 900
A great list of Prometheus alerting rules to start monitoring your services with can be found at Awesome Prometheus alerts.