Alertmanager

Alertmanager handles alerts generated by Prometheus. It manages the routing, grouping, and notification of alerts to various integrations such as email and webhooks.

What is Alertmanager?

Alertmanager is responsible for handling alerts sent by client applications such as Prometheus and then managing those alerts by grouping, deduplicating, routing, and sending them to various receiver integrations like email, webhook, PagerTree, etc.

Prometheus Alertmanager Architecture
Prometheus Alertmanager Architecture

Features

Key features of Alertmanager include:

  1. Grouping: Similar alerts can be grouped together to avoid overwhelming the users with redundant notifications.

  2. Inhibition: Prevents certain alerts from firing if another specific alert is already open. This helps prevent flooding with redundant notifications.

  3. Silencing: Administrators can silence certain alerts during maintenance or in response to known issues, preventing unnecessary notifications.

  4. Routing: Alerts can be routed to different destinations based on certain criteria, such as severity level, alert type, or specific attributes.

  5. Integration: Supports integration with various notification systems and channels like email, webhook, PagerTree, etc.

Grouping

Grouping categorizes alerts with a similar label set into a single notification. The group is configured by a routing tree in the configuration file.

Example: Your database goes down, and all services can no longer reach it. Prometheus' alerting rules were configured to send an alert for each service that cannot communicate with the database. As a result, many alerts were sent to Alertmanager. Alertmanager groups these alerts into one and sends a single alert/notification.

Inhibition

Inhibition suppresses notifications for certain alerts if certain other alerts are already firing. Inhibitions are configured through the Alertmanager configuration file.

Example: An alert is firing about an entire cluster that is not reachable. Alertmanager is configured to inhibit all other alerts concerning the cluster if this alert condition is already firing. This prevents duplicate alerts/notifications from being sent that might be downstream from the actual issue.

Silences

Silences are a way to mute alerts for a given time. Silences are configured in the web interface of Alertmanager.

Example: Incoming alerts are checked to see whether they match all the equality or regular expression matches of active silence. If they do, no notifications will be sent out for that alert.

Config File

Alertmanager is configured via command-line flags and a configuration file (YAML format). The full YAML scheme can be found in the official docs, and the visual editor can be used to help build route trees.

./alertmanager --config.file=alertmanager.yml

The following is an example configuration file:

alertmanager.yml
global:
  # Define the external URL where Alertmanager can be reached.
  resolve_timeout: 5m

route:
  # Group alerts by severity level
  group_by: ['severity']

  # Send critical alerts to the pagertree receiver
  routes:
    - match:
        severity: critical
      receiver: 'pagertree'

    # Send all other alerts to the email receiver
    - receiver: 'email'

receivers:
  - name: 'email'
    email_configs:
      - to: '[email protected]'
        from: '[email protected]'
        smarthost: 'smtp.example.com:587'
        auth_username: 'username'
        auth_password: 'password'
        require_tls: true

  - name: 'pagertree'
    webhook_configs:
      - url: 'https://api.pagertree.com/integration/int_xxx'

inhibit_rules:
  # Inhibit critical alerts if a certain other alert is firing
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'service']

Notification Templates

Notifications sent to receivers are constructed via templates. Alertmanager comes with default templates, but they can also be customized.

global:
  slack_api_url: '<slack_webhook_url>'

route:
  receiver: 'slack-notifications'
  group_by: [alertname, datacenter, app]

receivers:
- name: 'slack-notifications'
  slack_configs:
  - channel: '#alerts'
    text: 'https://internal.myorg.net/wiki/alerts/{{ .GroupLabels.app }}/{{ .GroupLabels.alertname }}'

High Availability

By default, Alertmanager starts in high availability mode. To configure the Alertmanager cluster, use the cluster-* flags.

Do not load balance traffic between Prometheus and Alertmanager. Instead, point Prometheus to a list of all Alertmanagers.

Last updated