[Cassandra] 记一次Cassandra集群中一个节点CPU满载问题排查过程

news/2024/11/15 4:26:35/

 

现象:Cassandra集群中一个节点CPU满载,Cassandra连接超时报错。该服务器CPU被Cassandra吃完。

排查过程:

1. top命令查看CPU占用最高的进程,确认为Cassandra

[root@VM_centos ~]# top
20772 cassand+  20   0 8299168 4.484g 126280 S 376.6 58.7  23:49.92 java   

2. jstack -l PID命令查看线程活动情况

[cassandra@VM_129_3_centos tmp]$ jstack -l 20772
2019-04-12 20:29:23
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.172-b11 mixed mode):"SharedPool-Worker-134" #396 daemon prio=5 os_prio=0 tid=0x00007f907104c1d0 nid=0x58f3 waiting on condition [0x00007f8fbd6b3000]java.lang.Thread.State: WAITING (parking)at sun.misc.Unsafe.park(Native Method)at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:87)at java.lang.Thread.run(Thread.java:748)Locked ownable synchronizers:- None
..."Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f90701b2640 nid=0x5193 in Object.wait() [0x00007f9074065000]java.lang.Thread.State: WAITING (on object monitor)at java.lang.Object.wait(Native Method)- waiting on <0x00000006d918e910> (a java.lang.ref.Reference$Lock)at java.lang.Object.wait(Object.java:502)at java.lang.ref.Reference.tryHandlePending(Reference.java:191)- locked <0x00000006d918e910> (a java.lang.ref.Reference$Lock)at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)Locked ownable synchronizers:- None"VM Thread" os_prio=0 tid=0x00007f90701a9200 nid=0x5191 runnable "Gang worker#0 (Parallel GC Threads)" os_prio=0 tid=0x00007f907001c630 nid=0x5127 runnable "Gang worker#1 (Parallel GC Threads)" os_prio=0 tid=0x00007f907001db40 nid=0x5128 runnable "Gang worker#2 (Parallel GC Threads)" os_prio=0 tid=0x00007f907001f050 nid=0x5129 runnable "Gang worker#3 (Parallel GC Threads)" os_prio=0 tid=0x00007f9070020560 nid=0x512a runnable "Concurrent Mark-Sweep GC Thread" os_prio=0 tid=0x00007f9070062c20 nid=0x518f runnable "VM Periodic Task Thread" os_prio=0 tid=0x00007f9070345590 nid=0x519d waiting on condition JNI global references: 1050[cassandra@VM_129_3_centos tmp]$ jstack -l 20772

10进制的进程号20772转成16进制是0x5124,在上面的结果中找到它

 

3. 查看堆内存jmap -heap PID

[cassandra@VM_129_3_centos tmp]$ jmap -heap 20772
Attaching to process ID 20772, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.172-b11using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GCHeap Configuration:MinHeapFreeRatio         = 40MaxHeapFreeRatio         = 70MaxHeapSize              = 4294967296 (4096.0MB)NewSize                  = 419430400 (400.0MB)MaxNewSize               = 419430400 (400.0MB)OldSize                  = 3875536896 (3696.0MB)NewRatio                 = 2SurvivorRatio            = 8MetaspaceSize            = 21807104 (20.796875MB)CompressedClassSpaceSize = 1073741824 (1024.0MB)MaxMetaspaceSize         = 17592186044415 MBG1HeapRegionSize         = 0 (0.0MB)Heap Usage:
New Generation (Eden + 1 Survivor Space):capacity = 377487360 (360.0MB)used     = 348288016 (332.15333557128906MB)free     = 29199344 (27.846664428710938MB)92.26481543646918% used
Eden Space:capacity = 335544320 (320.0MB)used     = 318190024 (303.4496536254883MB)free     = 17354296 (16.55034637451172MB)94.82801675796509% used
From Space:capacity = 41943040 (40.0MB)used     = 30097992 (28.70368194580078MB)free     = 11845048 (11.296318054199219MB)71.75920486450195% used
To Space:capacity = 41943040 (40.0MB)used     = 0 (0.0MB)free     = 41943040 (40.0MB)0.0% used
concurrent mark-sweep generation:capacity = 3875536896 (3696.0MB)used     = 79467816 (75.78641510009766MB)free     = 3796069080 (3620.2135848999023MB)2.050498244050261% used28574 interned Strings occupying 5184216 bytes.
[cassandra@VM_129_3_centos tmp]$ 

未完待续...

 

 

注意事项:

如果jstack -l PID命令返回“Unable to open socket file: ”,则需要参考Cassandra官方文档指定CASSANDRA_HEAPDUMP_DIR解决。

参考:https://alexzeng.wordpress.com/2013/05/25/debug-cassandrar-jvm-thread-100-cpu-usage-issue/


http://www.ppmy.cn/news/491561.html

相关文章

linux服务器cpu过高满载问题

1. 使用top查看目前正在运行的进程使用系统资源情况 找出CPU占用过高的程序 2. 通过ps -mp pid -o THREAD,tid,time命令, 查看某个进程下的线程情况, time代表这个线程已运行时间, 3. 将10进制线程号用计算器, 转16进制 TID 号 10进制转换成 16进制&#xff0c;然后去下方生成…

cpu满载寿命_电脑寿命逐个谈:CPU、内存最长寿,反倒是这个配件最爱坏

电脑寿命逐个谈&#xff1a;CPU、内存最长寿&#xff0c;反倒是这个配件最爱坏 2020-02-15 14:33:44 162点赞 173收藏 196评论 你是AMD Yes党&#xff1f;还是intel和NVIDIA的忠实簇拥呢&#xff1f;最新一届#装机大师赛#开始啦&#xff01;本次装机阵营赛分为3A红组、intel NV…

cpu 满载测试软件程序

https://www.cnblogs.com/djiankuo/p/6110991.html for i in seq 1 $(cat /proc/cpuinfo |grep "physical id" |wc -l); do dd if/dev/zero of/dev/null & done 说明: cat /proc/cpuinfo |grep "physical id" | wc -l 可以获得CPU的个数, 我们将其表…

Linux ECS CPU满载100% TOP进程无法查看解决记录

一、阿里一台ECS测试服务器提示检查到挖矿程序运行。 文件名为/usr/lib/libiacpkmn.so.3 影响&#xff1a;CUP进程100%持续满载 二、处理方法 1、通过阿里云盾杀掉及隔离进程文件  无效 2、通过linux删除libiacpkmn.so.3 提示无权限! # lsattr libiacpkmn.so.3 查看文…

GC导致CPU满载

昨天现场突然说服务器CPU满载&#xff0c;导致无法对外提供服务&#xff0c;刚重启服务器&#xff0c;过一段时间又满载了。于是我这个菜鸡就开始跟着大佬查问题。 一开始&#xff0c;我像无头苍蝇一样&#xff0c;用top看进程状态&#xff0c;看了半天&#xff0c;找出一堆虽…

服务器cpu位置,服务器CPU满载,谁之过?

01 运维口水战 某天&#xff0c;突如其来的问题发生了&#xff0c;面向互联网用户的一套业务系统中的某台Web服务器出现的异常&#xff0c;CPU跑满了。大量的用户页面非常慢&#xff0c;有时甚至访问不了。对于运维人员来说&#xff0c;犹如晴天霹雳。 网络运维人员迅速检查到该…

ERROR日志打印导致CPU满载

描述 开发环境对一台测试节点进行压测时&#xff0c;由于参数配置错误导致请求没有业务处理直接报错。随即停止压测后登录测试机器排查日志&#xff0c;以外发现虽然已经没有请求到该节点&#xff0c;但应用依然在打印错误日志&#xff0c;并且看日志时间是在打印几分钟前请求…

用Python实现Linux系统占用指定内存,CPU满载

背景 提出两个需求 占满系统CPU占用大内存 通过上网查资料&#xff0c;做实验使用Python实现了上述两个需求。 先看效果 执行前 内存 CPU 执行后 内存 CPU 代码实现 环境说明&#xff1a;代码使用Python3实现 #! /user/bin/env python # -*- encoding: utf-8 -*- import…