Prometheus Server安装
创建目录
mkdir /apps解压安装包
tar xvf prometheus-2.55.0.linux-amd64.tar.gz创建软链接
ln -sv /apps/prometheus-2.55.0.linux-amd64 /apps/prometheus
‘/apps/prometheus’ -> ‘/apps/prometheus-2.55.0.linux-amd64’cd /apps/prometheus
检测prometheus配置
./promtool check config prometheus.yml创建启动脚本
vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target[Service]
Restart=on-failure
WorkingDirectory=/apps/prometheus/
ExecStart=/apps/prometheus/prometheus
config.file=/apps/prometheus/prometheus.yml[Install]
WantedBy=multi-user.target启动服务
systemctl daemon-reload && systemctl restart prometheus
验证服务是否启动成功访问地址:http://localhost:9090/targets
rabbitmq-exporter安装(高版本不需要安装收集,已经暴露)
高版本注意:检查rabbitmq 是否开启对prometheus支持(http://localhost:15692/metrics)
配置收集节点数据:
vim /apps/prometheus/prometheus.yml修改参数- job_name: 'promethues-node'static_configs:- targets: ['172.31.7.111:9100']
Grafana添加Prometheus数据源
导入Grafana展示rabbitmq的模板
rabbitmq告警规则配置
设置告警规则:
- 'rules/*.yml'新建目录
mkdir rules目录新生成rabbitmq_rules.yml检查rules语法
/apps/prometheus/promtool check rules /apps/prometheus/rules/rabbitmq_rules.yml重新加载Prometheus (需支持热加载)
curl -X POST http://localhost:9090/-/reload
目前告警规则:
- 队列已就绪的消息不超过500
- RabbitMQ⽂件描述符使用率过高 60%
- RabbitMQ实例的磁盘空间 未来10天内磁盘的可用空间可能低于默认配置的50MB
- TCP套接字使用率高于60%
groups:- name: rabbitmq告警规则rules:- alert: RabbitMQ队列已就绪的消息过多expr: avg_over_time(rabbitmq_queue_messages_ready[5m]) > 500for: 5mlabels:severity: warningannotations:summary: '{{ $labels.instance }} RabbitMQ实例的队列消息准备过多'description: '{{ $labels.instance }}实例中平均准备好待消费的消息数量超过500,当前平均值为{{ $value }}。'- alert: RabbitMQ队列中已消费但未确认的消息过多expr: avg_over_time(rabbitmq_queue_messages_unacked[5m]) > 500for: 5mlabels:severity: warningannotations:summary: '{{ $labels.instance }} RabbitMQ实例的队列消息确认存在延迟'description: '{{ $labels.instance }} 实例中平均已被消费但未被确认的消息数量超过500,当前平均值为{{ $value }}。'- alert: RabbitMQ磁盘空间预测不足expr: predict_linear(rabbitmq_disk_space_available_bytes[24h], 60*60*24*10) < rabbitmq_disk_space_available_limit_bytesfor: 1hlabels:severity: criticalannotations:summary: '{{ $labels.instance }} RabbitMQ实例的磁盘空间预测不足。'description: '基于过去24小时磁盘可用空间数据预测,未来10天内磁盘的可用空间可能低于默认配置的50MB。'- alert: RabbitMQ⽂件描述符使用率过高expr: max_over_time(rabbitmq_process_open_fds[5m]) / rabbitmq_process_max_fds * 100 > 60for: 5mlabels:severity: warningannotations:summary: '{{ $labels.instance }} RabbitMQ实例的文件描述符使用率过高'description: '{{ $labels.instance }} 实例打开的文件描述符数量最大值,占文件描述限制的比率超过60%,当前比率为{{ $value }}%。'- alert: RabbitMQ TCP套接字使用率过高expr: max_over_time(rabbitmq_process_open_tcp_sockets[5m]) / rabbitmq_process_max_tcp_sockets * 100 > 60for: 5mlabels:severity: warningannotations:summary: '{{ $labels.instance }} RabbitMQ实例的TCP套接字使用率过高'description: '{{ $labels.instance }} 实例打开的TCP套接字数量最大值,占操作系统允许的TCP连接数限制的比率超过60%,当前比率为{{ $value }}%。'
Alertmanager安装配置
解压安装包
tar xvf alertmanager-0.27.0.linux-amd64.tar.gz创建软链接
ln -sv /apps/alertmanager-0.27.0.linux-amd64 /apps/alertmanager
'/apps/alertmanager' -> '/apps/alertmanager-0.27.0.linux-amd64'创建启动脚本
vim /etc/systemd/system/alertmanager.service[Unit]
Description=Prometheus alertmanager
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/apps/alertmanager/
ExecStart=/apps/alertmanager/alertmanager
config.file=/apps/alertmanager/alertmanager.yml
[Install]
WantedBy=multi-user.target设置启动服务
systemctl daemon-reload && systemctl restart alertmanager && systemctl enable alertmanager启动服务
systemctl restart alertmanager验证链接:http://localhost:9093/#/status
Prometheus配置Alertmanager
配置alertmanageralerting:alertmanagers:- static_configs:- targets:- 192.168.15.70:9093
安装PrometheusAlert
解压文件 tar xvf
unzip prometheus-alert-linux.zip重命名文件
mv linux prometheusalert修改配置文件(账号,密码,端口,设置飞书通知)
vim /apps/prometheusalert/conf/app.conf创建启动脚本
vim /etc/systemd/system/prometheusalert.service[Unit]
Description=Prometheus alert
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/apps/prometheusalert/
ExecStart=/apps/prometheusalert/PrometheusAlert --config.file=/apps/prometheusalert/conf/app.conf
[Install]
WantedBy=multi-user.target启动服务
systemctl daemon-reload && systemctl restart prometheusalert问题点:应用程序权限不足(加权)
cd /apps/prometheusalertchmod +x PrometheusAlert
配置告警通道
open-feishu=1
#默认飞书机器人地址
#fsurl=https://open.feishu.cn/open-apis/bot/hook/xxxxxxxxx
fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/839fd0b3-51de-49df-9db0-a014c95c179d
# webhook 发送 http 请求的 contentType, 如 application/json, application/x-www-form-urlencoded,不配置默认 application/json
wh_contenttype=application/json
Alertmanager配置PrometheusAlert
receivers:- name: 'web.hook'webhook_configs:- url: 'http://127.0.0.1:8088/prometheusalert?type=fs&tpl=prometheus-fs&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/839fd0b3-51de-49df-9db0-a014c95c179d'
资源下载链接:
PrometheusAlert安装包
alertmanager安装包
grafana中rabbitmq模板
prometheus安装包