安装kafka集群监控
集团的kafka集群,是保障集团消息的生产和消费的情况,需要我们运维人员进行实时监控,目前这套生产可用。
安装步骤
[root@gtcq-gt-resource2-db-01 opt]# cd /opt/ ;scp root@10.152.17.11:/opt/kafka* /opt
root@10.152.17.11's password:
kafka_exporter-1.2.0.linux-amd64.tar.gz 100% 4129KB 84.1MB/s 00:00
[root@gtcq-gt-resource2-db-01 opt]# tar -zxvf kafka_exporter-1.2.0.linux-amd64.tar.gz -C /usr/local/
[root@gtcq-gt-resource2-db-01 local]# cd kafka_exporter-1.2.0.linux-amd64/
[root@gtcq-gt-resource2-db-01 kafka_exporter-1.2.0.linux-amd64]# ./kafka_exporter --kafka.server=10.152.17.50:9092 &
[13] 46697
[root@gtcq-gt-resource2-db-01 kafka_exporter-1.2.0.linux-amd64]# INFO[0000] Starting kafka_exporter (version=1.2.0, branch=HEAD, revision=830660212e6c109e69dcb1cb58f5159fe3b38903) source="kafka_exporter.go:474"
INFO[0000] Build context (go=go1.10.3, user=root@981cde178ac4, date=20180707-14:34:48) source="kafka_exporter.go:475"
INFO[0000] Done Init Clients source="kafka_exporter.go:213"
INFO[0000] Listening on :9308 source="kafka_exporter.go:499"[root@gtcq-gt-resource2-db-01 kafka_exporter-1.2.0.linux-amd64]#
对应的服务端口为9308
修改Prometheus配置文件
##kafka监控 - job_name: 'job-gtcq-gt-kafka'static_configs:- targets: ['10.152.17.50:9308']labels:service: gtcq-gt-kafka-service
展示
kafka监控指标
1、kafka监控agent存活
PromQL语句:
up{job="job-gtcq-gt-kafka"} == 0
说明:检测kafka监控agent是否存活、或者是kafka服务是否挂掉
2、kafka监控集群节点数
PromQL语句:
kafka_brokers{job="job-gtcq-gt-kafka",service="gtcq-gt-kafka-service"}
说明:检测kafka监控集群节点数、如果和实际不一致就告警
3、kafka监控消费延迟
PromQL语句:
sum(kafka_consumergroup_lag{instance="10.152.17.50:9308",job="job-gtcq-gt-kafka",service="gtcq-gt-kafka-service"}) by(topic,job) > 50000
说明:检测kafka监控消费延迟、如果消费延迟50000条就告警