Kafka 监控之分层存储监控和 KRaft 监控指标

ops/2024/11/21 0:37:34/

目录

一. 前言

二. 分层存储监控(Tiered Storage Monitoring)

三. KRaft 监控指标(KRaft Monitoring Metrics)

3.1.  KRaft 投票人数监控指标(KRaft Quorum Monitoring Metrics)

3.2. KRaft 控制器监控指标(KRaft Controller Monitoring Metrics)

3.3. KRaft Broker 监控指标(KRaft Broker Monitoring Metrics)


一. 前言

    和任何一个分布式系统一样,Kafka 的存储和网络使用情况也是我们需要关注和监控的指标,只有对存储和网络状态进行充分的监控才能及时发现问题并规避风险。

二. 分层存储监控(Tiered Storage Monitoring)

原文引用:The following set of metrics are available for monitoring of the tiered storage feature:

以下一组指标可用于监视分层存储功能:

METRIC/ATTRIBUTE NAMEDESCRIPTIONMBEAN NAME
Remote Fetch Bytes Per SecRate of bytes read from remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate

kafka.server:type=BrokerTopicMetrics,

name=RemoteFetchBytesPerSec,topic=([-.\w]+)

Remote Fetch Requests Per SecRate of read requests from remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate

kafka.server:type=BrokerTopicMetrics,

name=RemoteFetchRequestsPerSec,topic=([-.\w]+)

Remote Fetch Errors Per SecRate of read errors from remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate

kafka.server:type=BrokerTopicMetrics,

name=RemoteFetchErrorsPerSec,topic=([-.\w]+)

Remote Copy Bytes Per SecRate of bytes copied to remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate

kafka.server:type=BrokerTopicMetrics,

name=RemoteCopyBytesPerSec,topic=([-.\w]+)

Remote Copy Requests Per SecRate of write requests to remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate

kafka.server:type=BrokerTopicMetrics,

name=RemoteCopyRequestsPerSec,topic=([-.\w]+)

Remote Copy Errors Per SecRate of write errors from remote storage per topic. Omitting 'topic=(...)' will yield the all-topic rate

kafka.server:type=BrokerTopicMetrics,

name=RemoteCopyErrorsPerSec,topic=([-.\w]+)

RemoteLogReader Task Queue SizeSize of the queue holding remote storage read tasks

org.apache.kafka.storage.internals.log:type=

RemoteStorageThreadPool,

name=RemoteLogReaderTaskQueueSize

RemoteLogReader Avg Idle PercentAverage idle percent of thread pool for processing remote storage read tasks

org.apache.kafka.storage.internals.log:type=

RemoteStorageThreadPool,

name=RemoteLogReaderAvgIdlePercent

RemoteLogManager Tasks Avg Idle PercentAverage idle percent of thread pool for copying data to remote storage

kafka.log.remote:type=RemoteLogManager,

name=RemoteLogManagerTasksAvgIdlePercent

三. KRaft 监控指标(KRaft Monitoring Metrics)

原文引用:The set of metrics that allow monitoring of the KRaft quorum and the metadata log.
Note that some exposed metrics depend on the role of the node as defined by process.roles

允许监视 KRaft 投票数和元数据日志的一组度量。

请注意,一些公开的度量取决于 process.roles 定义的节点的角色

3.1.  KRaft 投票人数监控指标(KRaft Quorum Monitoring Metrics)

原文引用:These metrics are reported on both Controllers and Brokers in a KRaft Cluster

KRaft 集群中的控制器和 Broker 都报告了这些指标:

METRIC/ATTRIBUTE NAMEDESCRIPTIONMBEAN NAME
Current StateThe current state of this member; possible values are leader, candidate, voted, follower, unattached, observer.kafka.server:type=raft-metrics,name=current-state
Current LeaderThe current quorum leader's id; -1 indicates unknown.kafka.server:type=raft-metrics,name=current-leader
Current VotedThe current voted leader's id; -1 indicates not voted for anyone.kafka.server:type=raft-metrics,name=current-vote
Current EpochThe current quorum epoch.kafka.server:type=raft-metrics,name=current-epoch
High WatermarkThe high watermark maintained on this member; -1 if it is unknown.kafka.server:type=raft-metrics,name=high-watermark
Log End OffsetThe current raft log end offset.

kafka.server:type=raft-metrics,

name=log-end-offset

Number of Unknown Voter ConnectionsNumber of unknown voters whose connection information is not cached. This value of this metric is always 0.

kafka.server:type=raft-metrics,

name=number-unknown-voter-connections

Average Commit LatencyThe average time in milliseconds to commit an entry in the raft log.kafka.server:type=raft-metrics,name=commit-latency-avg
Maximum Commit LatencyThe maximum time in milliseconds to commit an entry in the raft log.kafka.server:type=raft-metrics,name=commit-latency-max
Average Election LatencyThe average time in milliseconds spent on electing a new leader.kafka.server:type=raft-metrics,name=election-latency-avg
Maximum Election LatencyThe maximum time in milliseconds spent on electing a new leader.kafka.server:type=raft-metrics,name=election-latency-max
Fetch Records RateThe average number of records fetched from the leader of the raft quorum.kafka.server:type=raft-metrics,name=fetch-records-rate
Append Records RateThe average number of records appended per sec by the leader of the raft quorum.kafka.server:type=raft-metrics,name=append-records-rate
Average Poll Idle RatioThe average fraction of time the client's poll() is idle as opposed to waiting for the user code to process records.

kafka.server:type=raft-metrics,

name=poll-idle-ratio-avg

Current Metadata VersionOutputs the feature level of the current effective metadata version.

kafka.server:type=MetadataLoader,

name=CurrentMetadataVersion

Metadata Snapshot Load CountThe total number of times we have loaded a KRaft snapshot since the process was started.

kafka.server:type=MetadataLoader,

name=HandleLoadSnapshotCount

Latest Metadata Snapshot SizeThe total size in bytes of the latest snapshot that the node has generated. If none have been generated yet, this is the size of the latest snapshot that was loaded. If no snapshots have been generated or loaded, this is 0.

kafka.server:type=SnapshotEmitter,

name=LatestSnapshotGeneratedBytes

Latest Metadata Snapshot AgeThe interval in milliseconds since the latest snapshot that the node has generated. If none have been generated yet, this is approximately the time delta since the process was started.

kafka.server:type=SnapshotEmitter,

name=LatestSnapshotGeneratedAgeMs

3.2. KRaft 控制器监控指标(KRaft Controller Monitoring Metrics)

METRIC/ATTRIBUTE NAMEDESCRIPTIONMBEAN NAME
Active Controller CountThe number of Active Controllers on this node. Valid values are '0' or '1'.

kafka.controller:type=KafkaController,

name=ActiveControllerCount

Event Queue Time MsA Histogram of the time in milliseconds that requests spent waiting in the Controller Event Queue.

kafka.controller:type=ControllerEventManager,

name=EventQueueTimeMs

Event Queue Processing Time MsA Histogram of the time in milliseconds that requests spent being processed in the Controller Event Queue.

kafka.controller:type=ControllerEventManager,

name=EventQueueProcessingTimeMs

Fenced Broker CountThe number of fenced brokers as observed by this Controller.

kafka.controller:type=KafkaController,

name=FencedBrokerCount

Active Broker CountThe number of active brokers as observed by this Controller.

kafka.controller:type=KafkaController,

name=ActiveBrokerCount

Global Topic CountThe number of global topics as observed by this Controller.

kafka.controller:type=KafkaController,

name=GlobalTopicCount

Global Partition CountThe number of global partitions as observed by this Controller.

kafka.controller:type=KafkaController,

name=GlobalPartitionCount

Offline Partition CountThe number of offline topic partitions (non-internal) as observed by this Controller.

kafka.controller:type=KafkaController,

name=OfflinePartitionCount

Preferred Replica Imbalance CountThe count of topic partitions for which the leader is not the preferred leader.

kafka.controller:type=KafkaController,

name=PreferredReplicaImbalanceCount

Metadata Error CountThe number of times this controller node has encountered an error during metadata log processing.

kafka.controller:type=KafkaController,

name=MetadataErrorCount

Last Applied Record OffsetThe offset of the last record from the cluster metadata partition that was applied by the Controller.

kafka.controller:type=KafkaController,

name=LastAppliedRecordOffset

Last Committed Record OffsetThe offset of the last record committed to this Controller.

kafka.controller:type=KafkaController,

name=LastCommittedRecordOffset

Last Applied Record TimestampThe timestamp of the last record from the cluster metadata partition that was applied by the Controller.

kafka.controller:type=KafkaController,

name=LastAppliedRecordTimestamp

Last Applied Record Lag MsThe difference between now and the timestamp of the last record from the cluster metadata partition that was applied by the controller. For active Controllers the value of this lag is always zero.

kafka.controller:type=KafkaController,

name=LastAppliedRecordLagMs

ZooKeeper Write Behind LagThe amount of lag in records that ZooKeeper is behind relative to the highest committed record in the metadata log. This metric will only be reported by the active KRaft controller.

kafka.controller:type=KafkaController,

name=ZkWriteBehindLag

ZooKeeper Metadata Snapshot Write TimeThe number of milliseconds the KRaft controller took reconciling a snapshot into ZooKeeper.

kafka.controller:type=KafkaController,

name=ZkWriteSnapshotTimeMs

ZooKeeper Metadata Delta Write TimeThe number of milliseconds the KRaft controller took writing a delta into ZK.

kafka.controller:type=KafkaController,

name=ZkWriteDeltaTimeMs

Timed-out Broker Heartbeat CountThe number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric.

kafka.controller:type=KafkaController,

name=TimedOutBrokerHeartbeatCount

Number Of Operations Started In Event QueueThe total number of controller event queue operations that were started. This includes deferred operations.

kafka.controller:type=KafkaController,

name=EventQueueOperationsStartedCount

Number of Operations Timed Out In Event QueueThe total number of controller event queue operations that timed out before they could be performed.

kafka.controller:type=KafkaController,

name=EventQueueOperationsTimedOutCount

Number Of New Controller ElectionsCounts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts.

kafka.controller:type=KafkaController,

name=NewActiveControllersCount

3.3. KRaft Broker 监控指标(KRaft Broker Monitoring Metrics)

METRIC/ATTRIBUTE NAMEDESCRIPTIONMBEAN NAME
Last Applied Record OffsetThe offset of the last record from the cluster metadata partition that was applied by the brokerkafka.server:type=broker-metadata-metrics,name=last-applied-record-offset
Last Applied Record TimestampThe timestamp of the last record from the cluster metadata partition that was applied by the broker.kafka.server:type=broker-metadata-metrics,name=last-applied-record-timestamp
Last Applied Record Lag MsThe difference between now and the timestamp of the last record from the cluster metadata partition that was applied by the brokerkafka.server:type=broker-metadata-metrics,name=last-applied-record-lag-ms
Metadata Load Error CountThe number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it.kafka.server:type=broker-metadata-metrics,name=metadata-load-error-count
Metadata Apply Error CountThe number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count

http://www.ppmy.cn/ops/17167.html

相关文章

统计单词数量(文件)(*)

请编写函数&#xff0c;统计英文文章的单词数量。 函数原型 int CountWord(FILE *f); 说明&#xff1a;参数 f 为文件指针。函数值为该文件的单词数量。 裁判程序 #include <stdio.h> #include <stdlib.h> #include <ctype.h>int CountWord(FILE *f);in…

[C++][算法基础]分组背包问题(动态规划)

有 &#x1d441; 组物品和一个容量是 &#x1d449; 的背包。 每组物品有若干个&#xff0c;同一组内的物品最多只能选一个。 每件物品的体积是 &#xff0c;价值是 &#xff0c;其中 &#x1d456; 是组号&#xff0c;&#x1d457; 是组内编号。 求解将哪些物品装入背包&a…

4月25日 C++day4

#include <iostream> using namespace std;class Person {const string name;int age;char sex; public:Person():name("lisi"){cout << "Person无参构造" << endl;}Person(string name,int age,char sex):name(name),age(age),sex(sex)…

nvm安装及使用(mac)

安装 curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash# orwget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash这步会自动在你的文件中添加nvm配置文件. 如果你用的是zsh, 那就是 ~/.zshrc. 如果你用的 bas…

设计模式-创建型模式-工厂模式

工厂模式是一种用来创建对象的模式&#xff0c;它将对象的创建和使用分离开来&#xff0c;使得代码更加灵活和可扩展。 下面代码中CarFactory是一个工厂类&#xff0c;它根据传入的参数来创建不同类型的Car对象。通过工厂模式&#xff0c;在不改变客户端代码的情况下轻松地添加…

GIT 仓库迁移

GIT 仓库迁移 远端仓库迁移 ## 在远端提前创建仓库print-server ## 克隆所有分支 git clone --mirror http://X.X.X.X:8088/Print_Client.git ## 进入本地克隆目录 cd Print_Client.git ## 推送远端 git push --mirror http://X.X.X.X:8088/print/print-server.git本地项目迁…

睫毛膏上架亚马逊销售需要做什么准备 HRIPT / RIPT斑贴试验

睫毛膏上架需要办理&#xff1a;HRIPT / RIPT斑贴试验COA成分分析证书BCOP认证报告&#xff01; 什么是BCOP&#xff1a; 亚马逊美国站对接触眼睛的眼影&#xff0c;液体眼线笔&#xff0c;磁性睫毛&#xff0c;假睫毛等产品&#xff0c;需提供BCOP&#xff08;Bovine Corneal…

密码学基础 -- ECC

目录 1.ECC概述 1.1 汽车行业倾向使用ECC 1.2 ECC的难以理解 2.ECC原理 2.1 椭圆曲线真的不是一个椭圆 2.2 从图形了解ECC 2.3 ECC用法 3.ECC曲线汇总 1.ECC概述 1.1 汽车行业倾向使用ECC 当前公认安全有效的三大类公钥密钥体制分别为基于大数因子分解难题(RSA)、离散…