背景:
线上一k8s环境告警出节点失联,通过排查和k8s的api建立链接失败,检查发现haproxy出现了重启,对应的日志显示出程序运行崩溃,这个情况根据日志追溯,发现曾多次崩溃,后续也在其他k8s环境也有出现。崩溃日志如下:
{“log”:“[NOTICE] 107/003719 (1) : New worker #1 (7) forked\n”,“stream”:“stderr”,“time”:“2024-04-17T00:37:19.824357956Z”}
{“log”:“Thread 4 is about to kill the process.\n”,“stream”:“stderr”,“time”:“2024-05-21T22:07:34.11114335Z”}
{“log”:" Thread 1 : act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111167358Z”}
{“log”:" stuck=0 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.11117337Z”}
{“log”:" cpu_ns: poll=151330165980 now=152417714721 diff=1087548741\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111177689Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111182066Z”}
{“log”:" \u003eThread 2 : act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.1111862Z”}
{“log”:" stuck=1 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111190734Z”}
{“log”:" cpu_ns: poll=149926470027 now=151612708615 diff=1686238588\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111194835Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111198839Z”}
{“log”:" call trace(8):\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111202879Z”}
{“log”:" | 0x55d4fef705f3 [89 c5 85 c0 0f 85 e3 01]: ha_thread_dump+0x253/0x458\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111206706Z”}
{“log”:" | 0x55d4fef70866 [48 8b 05 23 16 1d 00 48]: debug_handler+0x66/0x10e\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111210628Z”}
{“log”:" | 0x7f1e0f216730 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x12730\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111214557Z”}
{“log”:" | 0x55d4fef59040 [0f 85 f4 02 00 00 48 b8]: process_runnable_tasks+0x3b0/0x75b\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111218605Z”}
{“log”:" Thread 3 : act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111222599Z”}
{“log”:" stuck=0 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111226627Z”}
{“log”:" cpu_ns: poll=1009086009481 now=1010561461521 diff=1475452040\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111230556Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111250608Z”}
{“log”:“*\u003eThread 4 : act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n”,“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111254956Z”}
{“log”:" stuck=1 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111259161Z”}
{“log”:" cpu_ns: poll=979476082232 now=981672729209 diff=2196646977\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111263123Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111267109Z”}
{“log”:" call trace(12):\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111271203Z”}
{“log”:" | 0x55d4fef91e77 [48 83 c4 10 5b 5d 41 5c]: wdt_handler+0x107/0x114\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111275151Z”}
{“log”:" | 0x7f1e0f216730 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x12730\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111279082Z”}
{“log”:" | 0x55d4fef59033 [48 be 00 00 00 00 ff ff]: process_runnable_tasks+0x3a3/0x75b\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111282975Z”}
{“log”:" Thread 5 : act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.11128691Z”}
{“log”:" stuck=0 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111291316Z”}
{“log”:" cpu_ns: poll=148278385582 now=149700349268 diff=1421963686\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111295205Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111299086Z”}
{“log”:" \u003eThread 6 : act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111302863Z”}
{“log”:" stuck=1 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111306964Z”}
{“log”:" cpu_ns: poll=148756467035 now=150175517887 diff=1419050852\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111310773Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111314681Z”}
{“log”:" call trace(8):\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111318453Z”}
{“log”:" | 0x55d4fef705f3 [89 c5 85 c0 0f 85 e3 01]: ha_thread_dump+0x253/0x458\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111322535Z”}
{“log”:" | 0x55d4fef70866 [48 8b 05 23 16 1d 00 48]: debug_handler+0x66/0x10e\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111327669Z”}
{“log”:" | 0x7f1e0f216730 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x12730\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111331797Z”}
{“log”:" | 0x55d4fef59030 [48 8b 02 48 be 00 00 00]: process_runnable_tasks+0x3a0/0x75b\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111335699Z”}
{“log”:" Thread 7 : act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111339745Z”}
{“log”:" stuck=0 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111343626Z”}
{“log”:" cpu_ns: poll=1000893082213 now=1002083221782 diff=1190139569\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111347437Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111351316Z”}
{“log”:" Thread 8 : act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111355091Z”}
{“log”:" stuck=0 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111358959Z”}
{“log”:" cpu_ns: poll=1101754150895 now=1103240639735 diff=1486488840\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111362785Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111366666Z”}
{“log”:" \u003eThread 9 : act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111370483Z”}
{“log”:" stuck=1 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111374385Z”}
{“log”:" cpu_ns: poll=1058224033381 now=1060422202804 diff=2198169423\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.11138071Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111384792Z”}
{“log”:" call trace(8):\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111388591Z”}
{“log”:" | 0x55d4fef705f3 [89 c5 85 c0 0f 85 e3 01]: ha_thread_dump+0x253/0x458\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111392324Z”}
{“log”:" | 0x55d4fef70866 [48 8b 05 23 16 1d 00 48]: debug_handler+0x66/0x10e\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.11139622Z”}
{“log”:" | 0x7f1e0f216730 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x12730\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111400117Z”}
{“log”:" | 0x55d4fef59030 [48 8b 02 48 be 00 00 00]: process_runnable_tasks+0x3a0/0x75b\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111403998Z”}
{“log”:" Thread 10: act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111407939Z”}
{“log”:" stuck=0 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111411786Z”}
{“log”:" cpu_ns: poll=148533347928 now=149605151681 diff=1071803753\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111415572Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111419455Z”}
{“log”:" Thread 11: act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111423233Z”}
{“log”:" stuck=0 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111427075Z”}
{“log”:" cpu_ns: poll=154241567225 now=155736607041 diff=1495039816\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111430903Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.11143479Z”}
{“log”:" Thread 12: act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111438562Z”}
{“log”:" stuck=0 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111442448Z”}
{“log”:" cpu_ns: poll=157497641277 now=158546184939 diff=1048543662\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111446268Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111450241Z”}
{“log”:" \u003eThread 13: act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111454025Z”}
{“log”:" stuck=1 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111457891Z”}
{“log”:" cpu_ns: poll=149848571365 now=152047468944 diff=2198897579\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111461694Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.11146556Z”}
{“log”:" call trace(8):\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111469321Z”}
{“log”:" | 0x55d4fef705f3 [89 c5 85 c0 0f 85 e3 01]: ha_thread_dump+0x253/0x458\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111473445Z”}
{“log”:" | 0x55d4fef70866 [48 8b 05 23 16 1d 00 48]: debug_handler+0x66/0x10e\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111477443Z”}
{“log”:" | 0x7f1e0f216730 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x12730\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111481323Z”}
{“log”:" | 0x55d4fef59033 [48 be 00 00 00 00 ff ff]: process_runnable_tasks+0x3a3/0x75b\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111485223Z”}
{“log”:" \u003eThread 14: act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111489139Z”}
{“log”:" stuck=1 prof=0 harmless=0 wantrdv=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111493058Z”}
{“log”:" cpu_ns: poll=149437998131 now=151636962981 diff=2198964850\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111496852Z”}
{“log”:" curr_task=0\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111500735Z”}
{“log”:" call trace(8):\n",“stream”:“stderr”,“time”:“2024-05-21T22:07:34.111506879Z”}
解决办法:
查询相关haproxy官网显示,确实存在这个问题,需要将haproxy升级到2.8.4以上,最终自己做了一个haproxy 2.9.9版本的镜像,推送到k8s环境进行观察和高可用测试,发现并无问题后,更新至生产环境,后面就没出现过次问题。
下载镜像地址:
haproxy:v2.9.9