How to Write Alerting Rules in Prometheus
You must be trying out Prometheus in your infrastructure. While doing so, you have encountered some rules in Prometheus. Those are called recording rules and alerting rules.
Here in this article, we are going to talk about the alerting rules.
What are Alerting Rules?
Alerting rules are those rules that allow to define conditions based on which you are alerted. If the alert condition is true, then it will trigger a notification to different notification channels.
For example, if you want to generate an alert in case the free system memory goes below 10%, then the alert rule will help to do so.
Let’s define the alerting rule.
Defining Alert Rule
- Edit main prometheues config to include a rules configuration file.
$ vim prometheus.yml rule_files: - "prometheus_alerting_rules.yml"
- Write the alert rule in “prometheus_alerting_rules.yml” config file
$ cat prometheus_recording_rules.yml groups: - name: low_memory_alert rules: - alert: LowMemory expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 < 10 for: 5m labels: severity: warning annotations: summary: "Host is low on memory. Only $value% left"
- Now, see the alerts through prometheus Alerts tab.
Understanding Alert Rule Definition
groups: - name: low_memory_alert rules: - alert: LowMemory expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 < 10 for: 5m labels: severity: warning annotations: summary: "Host is low on memory. Only $value% left"
- Rules are defined within a group called as a rule group.
- low_memory_alert is name of the group. You can name it anything as you like :).
- Then, multiple rules can be defined within a group. We have defined a single rule.
- alert is name of the new time series data.
- expr is the PromQL expression that gets evaluated at regular intervals along with condition
- for is the duration for which alert condition should be true before trigger a notification
- labels are short information labels attached to the alert
- annotations are further information (generally longer) attached to the alert