部分ref :https://cloud.tencent.com/developer/article/1177442
1.core文件
当程序运行过程中出现Segmentation fault (core dumped)错误时,程序停止运行,并产生core文件。core文件是程序运行状态的内存映象。使用gdb调试core文件,可以帮助我们快速定位程序出现段错误的位置。当然,可执行程序编译时应加上-g编译选项,生成调试信息。
当程序访问的内存超出了系统给定的内存空间,就会产生Segmentation fault (core dumped),因此,段错误产生的情况主要有: (1)访问不存在的内存地址; (2)访问系统保护的内存地址; (3)数组访问越界等。
core dumped又叫核心转储, 当程序运行过程中发生异常, 程序异常退出时, 由操作系统把程序当前的内存状况存储在一个core文件中, 叫core dumped。
core意指core memory,用线圈做的内存。如今 ,半导体工业澎勃发展,已经没有人用 core memory 了,不过,在许多情况下,人们还是把记忆体叫作 core 。
2.控制core文件是否生成
(1)使用ulimit -c命令可查看core文件的生成开关。若结果为0,则表示关闭了此功能,不会生成core文件。
( 2) 使用ulimit -c filesize命令,可以限制core文件的大小(filesize的单位为KB)。如果生成的信息超过此大小,将会被裁剪,最终生成一个不完整的core文件。在调试此core文 件的时候,gdb会提示错误。比如:ulimit -c 1024。
(3)使用ulimit -c unlimited,则表示core文件的大小不受限制。
在终端通过命令ulimit -c unlimited
只是临时修改,重启后无效 ,要想永久修改有三种方式: (1)在/etc/rc.local 中增加一行 ulimit -c unlimited (2)在/etc/profile 中增加一行 ulimit -c unlimited (3)在/etc/security/limits.conf最后增加如下两行记录:
@root soft core unlimited
@root hard core unlimited
3.core文件的名称和生成路径
core默认的文件名称是core.pid,pid指的是产生段错误的程序的进程号。 默认路径是产生段错误的程序的当前目录。
如果想修改core文件的名称和生成路径,相关的配置文件为: **/proc/sys/kernel/core_uses_pid:**控制产生的core文件的文件名中是否添加pid作为扩展,如果添加则文件内容为1,否则为0。
**/proc/sys/kernel/core_pattern:**可以设置格式化的core文件保存的位置和文件名,比如原来文件内容是core-%e。 可以这样修改: echo “/corefile/core-%e-%p-%t” > /proc/sys/kernel/core_pattern 将会控制所产生的core文件会存放到/corefile目录下,产生的文件名为:core-命令名-pid-时间戳。
以下是参数列表: %p - insert pid into filename 添加pid %u - insert current uid into filename 添加当前uid %g - insert current gid into filename 添加当前gid %s - insert signal that caused the coredump into the filename 添加导致产生core的信号 %t - insert UNIX time that the coredump occurred into filename 添加core文件生成时的unix时间 %h - insert hostname where the coredump happened into filename 添加主机名 %e - insert coredumping executable name into filename 添加命令名。 一般情况下,无需修改,按照默认的方式即可。
4.gdb调试core文件的步骤
使用gdb调试core文件来查找程序中出现段错误的位置时,要注意的是可执行程序在编译的时候需要加上-g编译命令选项。
gdb调试core文件的步骤常见的有如下几种,推荐第一种。
具体步骤一: (1)启动gdb,进入core文件,命令格式:gdb [exec file] [core file]。 用法示例:gdb ./test test.core。
(2)在进入gdb后,查找段错误位置:where或者bt 用法示例:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Lf0AG81u-1585292765871)(https://ask.qcloudimg.com/raw/yehe-4fad69835728/oy8gvnduzo.png?imageView2/2/w/1620)]
可以定位到源程序中具体文件的具体位置,出现了段错误。
具体步骤二: (1)启动gdb,进入core文件,命令格式:gdb –core=[core file]。 用法示例:gdb –core=test.core。
(2)在进入gdb后,指定core文件对应的符号表,命令格式:file [exec file] . 用法示例:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3ikm2AZG-1585292765873)(https://ask.qcloudimg.com/raw/yehe-4fad69835728/u0hpsw096g.png?imageView2/2/w/1620)]
(3)查找段错误位置:where或者bt。 用法示例:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-HClwoJxR-1585292765874)(https://ask.qcloudimg.com/raw/yehe-4fad69835728/4e30ywe402.png?imageView2/2/w/1620)]
具体步骤三: (1)启动gdb,进入core文件,命令格式:gdb -c [core file]。 用法示例:gdb -core test.core。 (2)其它步骤同步骤二。
5.其它查找段错误位置的方法
可以使用gdb进行单步调试,来查找段错误出错的位置。gdb的用例具体见: Linux下gdb用法简单介绍。
6.gdb调试core文件过程示例
(gdb) l l2xx_init.c:153
152
153 for (UINT32 instIndex = 0; instIndex < instanceCount; ++instIndex) {
154 instBuf->magic = L2xxx_MAGIC;
155 instBuf->allocBufSize = alignedBufSize - (UINT32)sizeof(L2xxBuffer);
156 instBuf = (L2xxBuffer *)(VOID *)((UINT8 *)(VOID *)instBuf + alignedBufSize);
157 }
(gdb) b l2xx_init.c:153
Breakpoint 4 at 0xecb648: file l2xx_init.c, line 153.
(gdb) r
Starting program: /xxx/linux_output/testrunner
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[start Run](Pxx_ST_DxxOUP_by_xxx)
[New Thread 0xfffef0830ae0 (LWP 32027)]
[New Thread 0xfffef002fae0 (LWP 32028)]
[New Thread 0xfffeef82eae0 (LWP 32029)]
[New Thread 0xfffeef02dae0 (LWP 32030)]
[New Thread 0xfffeee82cae0 (LWP 32031)]
[New Thread 0xfffeee02bae0 (LWP 32032)]
[New Thread 0xfffeed82aae0 (LWP 32033)]
[New Thread 0xfffeed029ae0 (LWP 32034)]
[New Thread 0xfffeec828ae0 (LWP 32035)]
[New Thread 0xfffeec027ae0 (LWP 32036)]
[New Thread 0xfffeeb826ae0 (LWP 32037)]
[New Thread 0xfffeeb025ae0 (LWP 32038)]
[New Thread 0xfffeea824ae0 (LWP 32039)]
[New Thread 0xfffeea023ae0 (LWP 32040)]
[New Thread 0xfffee9822ae0 (LWP 32041)]
[New Thread 0xfffee9021ae0 (LWP 32042)]
[Switching to Thread 0xfffeeb826ae0 (LWP 32037)]Thread 12 "core-10 (0/10)" hit Breakpoint 4, L2xxxChrTagMem (tagIndex=0, bufSize=10644, instanceCount=1, pid=1228)at /xxx/l2xx_init.c:153
153 for (UINT32 instIndex = 0; instIndex < instanceCount; ++instIndex) {(gdb) dis 4
(gdb) c
Continuing.
..[Thread 0xfffeef82eae0 (LWP 32029) exited]
[Thread 0xfffeee82cae0 (LWP 32031) exited]
[Thread 0xfffeef02dae0 (LWP 32030) exited]
[Thread 0xfffeee02bae0 (LWP 32032) exited]
[Thread 0xfffef002fae0 (LWP 32028) exited]
[Thread 0xfffeed82aae0 (LWP 32033) exited]
[Thread 0xfffef0830ae0 (LWP 32027) exited]
[Thread 0xfffeec828ae0 (LWP 32035) exited]
[Thread 0xfffeec027ae0 (LWP 32036) exited]
[Thread 0xfffeed029ae0 (LWP 32034) exited]
[Thread 0xfffeeb826ae0 (LWP 32037) exited]
[Thread 0xfffeeb025ae0 (LWP 32038) exited]
[Thread 0xfffeea824ae0 (LWP 32039) exited]
[Thread 0xfffeea023ae0 (LWP 32040) exited]
[Thread 0xfffee9822ae0 (LWP 32041) exited]
[Thread 0xfffee9021ae0 (LWP 32042) exited](PDCCH_ST_INTRF_by_CTRL)
请勿连续执行多个测试套[testCase_ST_Suite_case]
../base/l2_xx_test.cpp:29: Failure
Value of: FALSEActual: false
Expected: trueThread 1 "testrunner[M]" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffbf6e1010 (LWP 32004)]
0x0000000000b9a1dc in CONTROLLER_KillAllChipr (chipId=0) at ../framework/control_center/controller.c:378
378 UINT32 chipNum = pInfo->chipNum;
(gdb) bt
#0 0x0000000000b9a1dc in CONTROLLER_KillAllChipAsMaster (chipId=0) at ../framework/control_center/controller.c:378
#1 0x000000000041aaa4 in L2Test::TearDownTestCase () at ../testrunner/base/l2_test.cpp:35
#2 0x0000000001714664 in testing::TestCase::RunTearDownTestCase (this=0x4c8c980) at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/include/gtest/gtest.h:910
#3 0x000000000171e514 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::TestCase, void> (object=0x4c8c980,method=(void (testing::TestCase::*)(class testing::TestCase * const)) 0x1714640 <testing::TestCase::RunTearDownTestCase()>, location=0x19d06f8 "TearDownTestCase()")at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:2300
#4 0x00000000017192c0 in testing::internal::HandleExceptionsInMethodIfSupported<testing::TestCase, void> (object=0x4c8c980,method=(void (testing::TestCase::*)(class testing::TestCase * const)) 0x1714640 <testing::TestCase::RunTearDownTestCase()>, location=0x19d06f8 "TearDownTestCase()")at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:2336
#5 0x0000000001709660 in testing::TestCase::Run (this=0x4c8c980) at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:2807
#6 0x000000000170ead0 in testing::internal::UnitTestImpl::RunAllTests (this=0x4c8c0a0) at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:4836
#7 0x000000000171ee54 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x4c8c0a0,method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x170e6bc <testing::internal::UnitTestImpl::RunAllTests()>,location=0x19d1148 "auxiliary test code (environments or event listeners)") at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:2300
#8 0x0000000001719a90 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x4c8c0a0,method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x170e6bc <testing::internal::UnitTestImpl::RunAllTests()>,location=0x19d1148 "auxiliary test code (environments or event listeners)") at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:2336
#9 0x000000000170d9d0 in testing::UnitTest::Run (this=0x49c2278 <testing::UnitTest::GetInstance()::instance>) at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:4433
#10 0x0000000001711380 in testing::RunCases () at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:6016
#11 0x0000000001711590 in testing::Init_UT (argc=1, argv=0xffffffffe848, bFlag=true, strIP=...) at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:6066
#12 0x000000000041b860 in main (argc=1, argv=0xffffffffe848) at ../xxx/testrunner/base/main.cpp:43
(gdb)
这时候 Thread 1 “testrunner[M]” received signal SIGSEGV, Segmentation fault.
原因很难定位,因为你向之前调试IT那样bt得出来是下面这坨
Thread 1 "testrunner[M]" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffbf6e1010 (LWP 32004)]
0x0000000000b9a1dc in CONTROLLER_KillAllChixxxr (chipId=0) at ../xx_test_base/framework/control_center/controller.c:378
378 UINT32 chipNum = pInfo->chipNum;
(gdb) bt
#0 0x0000000000b9a1dc in CONTROLLER_KillAllChi (chipId=0) at ../xxxtest_base/framework/control_center/controller.c:378
#1 0x000000000041aaa4 in xxTest::TearDownTestCase () at ../xxxst/testrunner/base/xx_test.cpp:35
#2 0x0000000001714664 in testing::TestCase::RunTearDownTestCase (this=0x4c8c980) at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/include/gtest/gtest.h:910
#3 0x000000000171e514 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::TestCase, void> (object=0x4c8c980,method=(void (testing::TestCase::*)(class testing::TestCase * const)) 0x1714640 <testing::TestCase::RunTearDownTestCase()>, location=0x19d06f8 "TearDownTestCase()")at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:2300
#4 0x00000000017192c0 in testing::internal::HandleExceptionsInMethodIfSupported<testing::TestCase, void> (object=0x4c8c980,method=(void (testing::TestCase::*)(class testing::TestCase * const)) 0x1714640 <testing::TestCase::RunTearDownTestCase()>, location=0x19d06f8 "TearDownTestCase()")at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:2336
#5 0x0000000001709660 in testing::TestCase::Run (this=0x4c8c980) at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:2807
#6 0x000000000170ead0 in testing::internal::UnitTestImpl::RunAllTests (this=0x4c8c0a0) at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:4836
#7 0x000000000171ee54 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x4c8c0a0,method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x170e6bc <testing::internal::UnitTestImpl::RunAllTests()>,location=0x19d1148 "auxiliary test code (environments or event listeners)") at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:2300
#8 0x0000000001719a90 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x4c8c0a0,method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x170e6bc <testing::internal::UnitTestImpl::RunAllTests()>,location=0x19d1148 "auxiliary test code (environments or event listeners)") at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:2336
#9 0x000000000170d9d0 in testing::UnitTest::Run (this=0x49c2278 <testing::UnitTest::GetInstance()::instance>) at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:4433
#10 0x0000000001711380 in testing::RunCases () at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:6016
#11 0x0000000001711590 in testing::Init_UT (argc=1, argv=0xffffffffe848, bFlag=true, strIP=...) at /home/syb/libdtcenter/gtest_1.6.0_dtcenter/src/gtest.cc:6066
#12 0x000000000041b860 in main (argc=1, argv=0xffffffffe848) at ../lxxpcst/testrunner/base/main.cpp:43
(gdb)
调用栈挂在PC框架上,而没有显示任何一个app代码,这tm怎么可能?
根据之前已经定位的进展,肯定要从APP_Root() 打点看开工哪里异常,之前通过这种方式也确实定位到是OM chr 申请静态内存部分失败了。但这样定位方式其实很不直观,从头开始摸着找,效率很低。既然这几天学习了gdb core文件调试方式,那就来试一下呗。
由于dev用户权限,只能ulimit -c临时生效。方法如下:
sudo su 临时提权
ulimit -c或ulimit -a再次查询 core file size
生效后后,到指定目录执行 可执行文件testrunner,若发仍然未生成coredump文件,需要再执行
设置core文件生成的路径
omu1:/opt/y00249743 # echo core > /proc/sys/kernel/core_pattern
然后再次执行 ./testrunner 就在同目录下生成了coredump文件
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pJ8O7Ggk-1585292765875)(C:\Users\w00448203\AppData\Roaming\Typora\typora-user-images\1584518081147.png)]
[root@szvphicprd90255 linux_output]# gdb ./testrunner core.28121
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:<http://www.gnu.org/software/gdb/documentation/>.For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./testrunner...[New LWP 28126]
[New LWP 28125]
[New LWP 28123]
[New LWP 28131]
[New LWP 28136]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./testrunner'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000ecb65c in L2xx_AllocChrTagMem (tagIndex=73, bufSize=15180, instanceCount=1, pid=1228) at xxxinit.c:154
154 instBuf->magic = L2XX_CHR_TAG_BUF_MAGIC;
[Current thread is 1 (Thread 0xfffeae3d7ae0 (LWP 28135))]
(gdb) bt
#0 0x0000000000ecb65c in xxxinitAllocTagMem (tagIndex=73, bufSize=15180, instanceCount=1, pid=1228) at xxxinit.c:154
#1 0x0000000000ecb6f4 in L2OxxxMemByFlag (chrTagInfo=0x4b32a50 <g_lxxInfo+5256>, tagRegInfo=0x1d19dc8 <g_lxxInfoTbl+192>, bufSize=15180, instanceCount=1,pid=1228) at xxxinit.c:180
#2 0x0000000000ecb9c0 in LgTbl (tagRegInfo=0x1d19dc8 <g_l2omChrTagRegxx+192>, pid=1228) at xxxinit.c:233
#3 0x0000000000ecba34 in L2OM_RegChrTagsInfo () at xxxinit.c:250
#4 0x0000000000ecc0e4 in L2OM_InitChrWithoutEg () at xxxinit.c:461
#5 0x0000000000ece594 in L2OM_InitWithoutEg () at init.c:131
#6 0x0000000000e46c70 in L2IlEevantRes () at /xxxinit.cinit.c:704
#7 0x0000000000e45fb4 in L2IalThread () at /xxxinit.cinit.c:255
#8 0x0000000000e46448 in L2tSys () at /xxxinit.cinit.c:424
#9 0x0000000000e45c60 in APP () at /xxxinit.cinit.c:127
#10 0x0000000000b9d8f0 in SysInit_SlaveCoreMainEntry (ulVcpuId=13) at ../l2tit_main.c:35
#11 0x0000000000b9a618 in RATENTRY_CoreStart (coreId=13) at ../l2_testat_entry.c:41
#12 0x0000000000b99a68 in CONTROLLER_CoreThreadEntry (param=0xd) at ../lroller.c:153
#13 0x0000ffff838a18bc in start_thread () from /lib64/libpthread.so.0
#14 0x0000ffff836e773c in thread_start () from /lib64/libc.so.6
(gdb) info locals
instIndex = 0
chrbuf = 0xfffed0e79000
chrTagInfo = 0x4b32a50 <g_l2sssInfo+5256>
alignedBufSize = 15232
alignedTotSize = 15232
__FUNCTION__ = "L2OMxxxcChrMem"
errNo = 0
instBuf = 0x0
(gdb) info reg
x0 0x0 0
x1 0x56ef56ef 1458525935
x2 0xd 13
x3 0xd 13
x4 0x3b80 15232
x5 0x49 73
x6 0x3b4c 15180
x7 0x4cc 1228
x8 0x101010101010101 72340172838076673
x9 0xe33ce0 14892256
x10 0x1 1
x11 0x1 1
x12 0x43636f6c6c415f4d 4855847334698901325
x13 0x6d654d6761547268 7882791829191815784
x14 0x12 18
x15 0xffffffffffffffff -1
x16 0x1c500e8 29688040
x17 0xffff83698d40 281472886476096
x18 0x1 1
x19 0x1 1
x20 0xffffe148fcc0 281474461400256
x21 0xffffe148fcbe 281474461400254
x22 0xffffe148fcbf 281474461400255
x23 0x1000 4096
x24 0xd 13
x25 0xffff838ca000 281472888774656
x26 0xfffeae3d81e0 281469310042592
x27 0xffffe148fcc0 281474461400256
x28 0xfffeae3d7ae0 281469310040800
x29 0xfffeae3d7000 281469310038016
x30 0xecb648 15513160
sp 0xfffeae3d7000 0xfffeae3d7000
pc 0xecb65c 0xecb65c <xxxinitAllocTagMem+512>
cpsr 0x80000000 [ EL=0 N ]
fpsr 0x0 0
fpcr 0x0 0(gdb) info reg sp
sp 0xfffeae3d7000 0xfffeae3d7000
(gdb)
加载core文件后直接就显示了 Segmentation fault. 的现场,bt info reg等现场信息很全面,可以更快速直接的定位啦。
0x80000000 [ EL=0 N ]
fpsr 0x0 0
fpcr 0x0 0
(gdb) info reg sp
sp 0xfffeae3d7000 0xfffeae3d7000
(gdb)
加载core文件后直接就显示了 Segmentation fault. 的现场,bt info reg等现场信息很全面,可以更快速直接的定位啦。