我的crash常用命令如下所示:
log/dmesg: 打印出故障现场的kmsg缓冲区log_buf中的内容。
struct:展示结构体的定义,或者从指定的地址开始解析一个结构体。
union:与struct类似,但是用于union的展示
p:print查看某个变量的值,实际上是调用gdb的p命令
whatis:展示结构体、联合体等定义
bt <pid>:展示调用堆栈信息,如果不加参数那么就可以利用SP和FP进行栈回溯打印;
-T显示一个进程从thread_info以上一直到堆栈底部的所有symbol信息,一般比不加参数打印出的信息更多;
-a显示所有active task的堆栈信息。
ps:展示系统中的进程状态,和正常系统运行时的ps命令类似
task <pid>:展示某个pid的task_struct内容,不加pid则表示当前进程
dis <address>:反汇编命令,-l可以展示源代码行。
mount:展示当前挂载的文件系统的命令
net:展示网络相关的信息
rd <address>: read memory操作,读取一个地址处的内容
file <pid>:查看某一个进程中的所有打开的文件
search -t <value/symbol>:在所有进程的stack页面中查找一个value或者一个symbol,并打印出来结果
后面将主要记录下实际的使用展示:
打印结构体和变量:
crash> union thread_union
union thread_union {unsigned long stack[2048];
}
除了显示结构体内容,还可以利用后跟地址打印一个结构体或者联合体的内容,比如:
crash> dis init_thread_union
0xffffff85e2720000 <.data>: .inst 0x57ac6e9d ; undefined
crash> union thread_union 0xffffff85e2720000
union thread_union {stack = {1470918301, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...}
}
我们想要利用union命令打印init_thread_union中的内容,先使用dis反汇编找到该变量的地址,然后再利用union后加地址打印出具体的内容。当然我们也可以利用后面介绍的p命令更容易的达到这个目的。
还是上面的例子,直接使用p命令打印一个全局变量:
crash> p init_thread_union
init_thread_union = $8 = {stack = {1470918301, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...}
}
这种只适合我们知道一个变量的名字的情况,假如我们只知道变量类型和地址,那么只能采用上面的union或者struct方式来获取信息。
查看一个结构体定义:
crash> whatis thread_info
struct thread_info {unsigned long flags;mm_segment_t addr_limit;int preempt_count;
}
SIZE: 24
或者使用struct:
crash> struct thread_info
struct thread_info {unsigned long flags;mm_segment_t addr_limit;int preempt_count;
}
SIZE: 24
使用list打印一个链表:
crash> list file_system_type.next 0xffffff85e27833a0
ffffff85e27833a0
ffffff85e2740e78
ffffff85e278e9a0
ffffff85e27805f0
ffffff85e2782f30
ffffff85e27642a0
ffffff85e2762020
ffffff85e2762098
ffffff85e2771c20
ffffff85e2783508
ffffff85e278f408
ffffff85e278f450
ffffff85e28e5790
ffffff85e281ced8
ffffff85e276cdc8
ffffff85e277d6e0
ffffff85e2783638
list命令后面需要指定结构体中的next指针变量成员,最后再指定指针头的address。该命令会把所有链表成员指针的地址都打印出来。加入我们想要以此打印链表中的某个成员的内容可以使用-s选项指定,比如:
crash> list file_system_type.next -s file_system_type.name 0xffffff85e27833a0
ffffff85e27833a0name = 0xffffff85e1f92148 "sysfs"
ffffff85e2740e78name = 0xffffff85e1d5c16d "rootfs"
ffffff85e278e9a0name = 0xffffff85e1d350d3 "ramfs"
ffffff85e27805f0name = 0xffffff85e1e8870c "bdev"
ffffff85e2782f30name = 0xffffff85e1f787b4 "proc"
ffffff85e27642a0name = 0xffffff85e1d56802 "cpuset"
ffffff85e2762020name = 0xffffff85e190bc40 "cgroup"
ffffff85e2762098name = 0xffffff85e1d49ee7 "cgroup2"
ffffff85e2771c20name = 0xffffff85e1d345f0 "tmpfs"
ffffff85e2783508name = 0xffffff85e1d61738 "configfs"
ffffff85e278f408name = 0xffffff85e1d56cb7 "debugfs"
ffffff85e278f450name = 0xffffff85e1d4c58f "tracefs"
ffffff85e28e5790name = 0xffffff85e1f884eb "sockfs"
ffffff85e281ced8name = 0xffffff85e1d67aef "dax"
ffffff85e276cdc8name = 0xffffff85e1d519ee "bpf"
ffffff85e277d6e0name = 0xffffff85e1d5b772 "pipefs"
ffffff85e2783638name = 0xffffff85e1d61b85 "devpts"
ffffff85e278ce00name = 0xffffff85e1d68bab "ext3"
ffffff85e278cdb8name = 0xffffff85e1d68b82 "ext2"
ffffff85e278cd70name = 0xffffff85e19315c0 "ext4"
ffffff85e278ea10name = 0xffffff85e1d6cba6 "vfat"
ffffff85e278ea58
查看init_task中的成员内容:
crash> dis init_task
0xffffff85e2740f00 <init_task>: .inst 0x00000022 ; undefined
crash> p 0xffffff85e2740f00
$11 = 18446743549227831040
crash> p *0xffffff85e2740f00
$12 = 34
crash> struct task_struct.comm,pid 0xffffff85e2740f00comm = "swapper/0\000\000\000\000\000\000"pid = 0
堆栈回溯分析:
crash> bt -T
PID: 574 TASK: ffffffff9fba8080 CPU: 4 COMMAND: "modprobe"
bt: WARNING: cannot determine starting stack frame for task ffffffff9fba8080[ffffff801b5cad10] save_trace at ffffff85e088aadc[ffffff801b5cad28] walk_stackframe at ffffff85e088aa10[ffffff801b5cad58] __save_stack_trace at ffffff85e088ac94[ffffff801b5cad68] __device_attach at ffffff85e0f41e84[ffffff801b5cadc0] __set_page_owner at ffffff85e0a560c0[ffffff801b5cadd8] save_stack at ffffff85e0a56198[ffffff801b5cae10] _raw_spin_unlock_irqrestore at ffffff85e18c8f4c[ffffff801b5cae28] return_address at ffffff85e088eb9c[ffffff801b5cae48] depot_save_stack at ffffff85e0c4c550[ffffff801b5cae50] _raw_spin_unlock_irqrestore at ffffff85e18c8f4c[ffffff801b5cae68] preempt_count_sub at ffffff85e08e5e10[ffffff801b5caf28] number at ffffff85e18bc248[ffffff801b5caf98] number at ffffff85e18bc248[ffffff801b5cafa8] number at ffffff85e18bc248[ffffff801b5cb018] vsnprintf at ffffff85e18bb2d4[ffffff801b5cb098] sprintf at ffffff85e18bc6a0[ffffff801b5cb158] vsnprintf at ffffff85e18bb2d4[ffffff801b5cb228] console_unlock at ffffff85e092b5f4[ffffff801b5cb2a8] vprintk_emit at ffffff85e092b094[ffffff801b5cb338] vprintk_default at ffffff85e092b7dc[ffffff801b5cb388] vprintk_func at ffffff85e092da58[ffffff801b5cb3d8] regmap_spmi_ext_write at ffffff85e0f6a970[ffffff801b5cb408] _regmap_raw_write at ffffff85e0f63f84[ffffff801b5cb4a8] printk at ffffff85e092a2fc
利用bt -T查看对应一个进程堆栈中的所有信息。当找到怀疑的函数之后,对照代码查看可疑函数的实现,并且利用dis反汇编查看该函数具体的执行内容和运行地址:
crash> dis report_usercopy
0xffffff85e0a5bf00 <report_usercopy>: stp x22, x21, [sp,#-48]!
0xffffff85e0a5bf04 <report_usercopy+4>: stp x20, x19, [sp,#16]
0xffffff85e0a5bf08 <report_usercopy+8>: stp x29, x30, [sp,#32]
0xffffff85e0a5bf0c <report_usercopy+12>: add x29, sp, #0x20
0xffffff85e0a5bf10 <report_usercopy+16>: mov x19, x3
0xffffff85e0a5bf14 <report_usercopy+20>: mov w20, w2
0xffffff85e0a5bf18 <report_usercopy+24>: mov x21, x1
0xffffff85e0a5bf1c <report_usercopy+28>: mov x22, x0
0xffffff85e0a5bf20 <report_usercopy+32>: nop
0xffffff85e0a5bf24 <report_usercopy+36>: adrp x8, 0xffffff85e1d4c000
0xffffff85e0a5bf28 <report_usercopy+40>: adrp x9, 0xffffff85e1d5b000
0xffffff85e0a5bf2c <report_usercopy+44>: add x8, x8, #0x4a5
之所以查看反汇编实现,是为了找到和C语言代码匹配的位置,进而定位出函数中的各个运行时参数值,并且在堆栈地址中找出来,有了这个参数,就可以带入到函数找出出错原因了。
查找功能
比如在堆栈中查找symbol:bug_handler
crash> search -t bug_handler
PID: 574 TASK: ffffffff9fba8080 CPU: 4 COMMAND: "modprobe"
ffffff801b5cb7f8: ffffff85e088b810 (bug_handler)
参考:
https://www.cnblogs.com/lshs/p/6113077.html
https://www.cnblogs.com/youngerchina/p/5624456.html