alertmanager如何自定义告警template
配置alertmanager告警模版
template功能:
https://prometheus.io/blog/2016/03/03/custom-alertmanager-templates/
默认的template:
https://github.com/prometheus/alertmanager/blob/main/template/default.tmpl
配置template:
templates:- '/etc/alertmanager/alert.template'route:group_by: ['instance']group_wait: 30sgroup_interval: 30srepeat_interval: 120mreceiver: 'internal'routes:- receiver: 'internal'group_by: ['alertname','instance', 'group','job']
receivers:- name: 'internal'wechat_configs:- corp_id: 'xxxxx'api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'send_resolved: trueto_user: 'zejia.lu'agent_id: 'xxxxxx'api_secret: 'xxxxxxxx'message: '{{ template "wechat.default.message" . }}'
自定义template模版:
{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 }}
========= 监控报警 =========
告警状态:{{ .Status }}
告警级别:{{ .Labels.severity }}
告警类型:{{ $alert.Labels.alertname }}
故障主机: {{ $alert.Labels.instance }}
告警主题: {{ $alert.Annotations.summary }}
告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};
触发阀值:{{ .Annotations.value }}
故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
========== end ==========
{{- end }}
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 }}
========= 异常恢复 =========
告警类型:{{ .Labels.alertname }}
告警状态:{{ .Status }}
告警主题: {{ $alert.Annotations.summary }}
告警详情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};
故障时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
恢复时间: {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
{{- if gt (len $alert.Labels.instance) 0 }}
实例信息: {{ $alert.Labels.instance }}
{{- end }}
========== end ==========
{{- end }}
{{- end }}
{{- end }}
{{- end }}
效果:
告警触发
========= 监控报警 =========
告警状态:firing
告警级别:critical
告警类型:目标采集失败
故障主机: 1.1.1.1:10001
告警主题: Control Plane Instance metrics collect failed
告警详情: 1.1.1.1:10001 is unavailable for 15 seconds.;
触发阀值:
故障时间: 2023-08-07 11:43:02
========== end ==========
告警恢复
========= 异常恢复 =========
告警类型:目标采集失败
告警状态:resolved
告警主题: Control Plane Instance metrics collect failed
告警详情: 1.1.1.1:10001 is unavailable for 15 seconds.;
故障时间: 2023-08-07 11:43:02
恢复时间: 2023-08-07 11:49:02
实例信息: 1.1.1.1:10001
========== end ==========