现象
我们的Spring Boot 项目启动时,偶现卡死的现象,启动到一半卡主不动了
2023-01-16 10:23:10.338 INFO 1 --- [ restartedMain] com.hazelcast.core.LifecycleService : [172.18.0.14]:5701 [dev] [4.2.4] [172.18.0.14]:5701 is STARTED
2023-01-16 10:23:12.391 INFO 1 --- [ restartedMain] c.h.h.HazelcastCacheRegionFactory : Starting up HazelcastCacheRegionFactory
2023-01-16 10:23:12.396 INFO 1 --- [ restartedMain] c.h.h.i.IHazelcastInstanceFactory : Using existing HazelcastInstance [unit].
2023-01-16 10:23:16.685 INFO 1 --- [ntLoopGroup-2-1] com.newatc.socketio.SocketIOServer : SocketIO server started at port: 7654
2023-01-16 10:23:16.702 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : controllerAgentServer bean listeners added
2023-01-16 10:23:16.972 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : channelParamHandler bean listeners added
2023-01-16 10:23:16.979 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : controllerAlarmHandler bean listeners added
2023-01-16 10:23:16.988 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : controllerFaultHandler bean listeners added
2023-01-16 10:23:16.995 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : controllerModelHandler bean listeners added
2023-01-16 10:23:17.000 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : controllerTimeHandler bean listeners added
2023-01-16 10:23:17.006 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : dayPlanParamHandler bean listeners added
2023-01-16 10:23:17.011 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : detectorEventHandler bean listeners added
2023-01-16 10:23:17.016 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : detectorParamHandler bean listeners added
2023-01-16 10:23:17.022 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : lampOffHandler bean listeners added
2023-01-16 10:23:17.027 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : patternParamHandler bean listeners added
2023-01-16 10:23:17.033 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : phaseFlowHandler bean listeners added
2023-01-16 10:23:17.038 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : phaseParamHandler bean listeners added
2023-01-16 10:23:17.044 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : phaseStatusHandler bean listeners added
2023-01-16 10:23:17.050 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : scheduleParamHandler bean listeners added
2023-01-16 10:23:17.057 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : signalIPParamHandler bean listeners added
2023-01-16 10:23:17.062 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : softwareVersionHandler bean listeners added
2023-01-16 10:23:17.068 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : stageParamHandler bean listeners added
2023-01-16 10:23:17.073 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : trafficParamsHandler bean listeners added
2023-01-16 10:23:17.079 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : vehGPSTrajectoryHandler bean listeners added
2023-01-16 10:23:17.084 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : wirelessOpLogHandler bean listeners added
2023-01-16 10:23:17.091 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : wirelessSecurityHandler bean listeners added
2023-01-16 10:23:18.018 INFO 1 --- [ntLoopGroup-3-1] c.n.socketio.handler.AuthorizeHandler : channel active cb27f0b9-236a-404c-8cb1-369366729ce2
2023-01-16 10:23:18.466 INFO 1 --- [ntLoopGroup-3-1] c.h.i.p.impl.PartitionStateManager : [172.18.0.14]:5701 [dev] [4.2.4] Initializing cluster partition table arrangement...
2023-01-16 10:23:18.776 INFO 1 --- [ntLoopGroup-3-1] c.n.unit.agent.ControllerAgentServer : onConnect -- client = 172.16.1.101 Port=38167 unitId:101
2023-01-16 10:23:18.981 INFO 1 --- [ntLoopGroup-3-2] c.n.socketio.handler.AuthorizeHandler : channel active 539fb94d-a5de-4fef-b81d-82803885973f
2023-01-16 10:23:19.008 INFO 1 --- [ntLoopGroup-3-2] c.n.unit.agent.ControllerAgentServer : onConnect -- client = 172.16.1.93 Port=39568 unitId:93
2023-01-16 10:23:22.177 INFO 1 --- [ Timer-0] c.n.unit.agent.ControllerAgentServer : Sync signal time ipAddress:172.16.1.101
2023-01-16 10:23:22.244 INFO 1 --- [ Timer-2] c.n.unit.agent.ControllerAgentServer : Sync signal time ipAddress:172.16.1.93
2023-01-16 10:23:25.953 INFO 1 --- [ntLoopGroup-3-3] c.n.socketio.handler.AuthorizeHandler : channel active 874e3919-489d-4970-8a53-e51e8a880057
2023-01-16 10:23:25.978 INFO 1 --- [ntLoopGroup-3-3] c.n.unit.agent.ControllerAgentServer : onConnect -- client = 172.16.1.107 Port=45166 unitId:107
2023-01-16 10:23:26.821 WARN 1 --- [ restartedMain] onConfiguration$FunctionBindingRegistrar : You have defined function definition that does not exist: streamBridge
2023-01-16 10:23:29.213 INFO 1 --- [ Timer-4] c.n.unit.agent.ControllerAgentServer : Sync signal time ipAddress:172.16.1.107
2023-01-16 10:24:00.174 INFO 1 --- [ Timer-1] c.n.unit.agent.ControllerAgentServer : Get signal time ipAddress:172.16.1.101
2023-01-16 10:24:00.248 INFO 1 --- [ Timer-3] c.n.unit.agent.ControllerAgentServer : Get signal time ipAddress:172.16.1.93
2023-01-16 10:24:07.212 INFO 1 --- [ Timer-5] c.n.unit.agent.ControllerAgentServer : Get signal time ipAddress:172.16.1.107
2023-01-16 10:24:37.351 INFO 1 --- [ntLoopGroup-3-4] c.n.socketio.handler.AuthorizeHandler : channel active bd580b9b-8050-4315-bf0d-07e8acc797b2
2023-01-16 10:24:37.368 INFO 1 --- [ntLoopGroup-3-4] c.n.unit.agent.ControllerAgentServer : onConnect -- client = 172.16.1.107 Port=45167 unitId:107
2023-01-16 10:24:40.595 INFO 1 --- [ Timer-6] c.n.unit.agent.ControllerAgentServer : Sync signal time ipAddress:172.16.1.107
2023-01-16 10:24:43.373 INFO 1 --- [ntLoopGroup-3-5] c.n.socketio.handler.AuthorizeHandler : channel active a7ffe375-462d-4c18-bd67-1811933b8084
2023-01-16 10:24:43.388 INFO 1 --- [ntLoopGroup-3-5] c.n.unit.agent.ControllerAgentServer : onConnect -- client = 172.16.1.93 Port=39570 unitId:93
2023-01-16 10:24:46.611 INFO 1 --- [ Timer-8] c.n.unit.agent.ControllerAgentServer : Sync signal time ipAddress:172.16.1.93
2023-01-16 10:25:18.595 INFO 1 --- [ Timer-7] c.n.unit.agent.ControllerAgentServer : Get signal time ipAddress:172.16.1.107
2023-01-16 10:25:24.612 INFO 1 --- [ Timer-9] c.n.unit.agent.ControllerAgentServer : Get signal time ipAddress:172.16.1.93
2023-01-16 10:25:51.726 INFO 1 --- [ntLoopGroup-3-6] c.n.socketio.handler.AuthorizeHandler : channel active fb525bf4-fdd7-4f88-9a0d-88c923ef24df
2023-01-16 10:25:51.743 INFO 1 --- [ntLoopGroup-3-6] c.n.unit.agent.ControllerAgentServer : onConnect -- client = 172.16.1.107 Port=45168 unitId:107
2023-01-16 10:25:54.964 INFO 1 --- [ Timer-10] c.n.unit.agent.ControllerAgentServer : Sync signal time ipAddress:172.16.1.107
2023-01-16 10:26:32.965 INFO 1 --- [ Timer-11] c.n.unit.agent.ControllerAgentServer : Get signal time ipAddress:172.16.1.107
启动卡死不是必现,综合了出现卡死时的情况,发现,当出现kafka发送消息时,偶尔会卡主。这让我想起之前处理的一个启动报错 Spring Cloud Stream kafka项目启动时报错 项目使用的是spring-cloud-starter-stream-kafka
,函数式编程。代码业务逻辑是,程序接收到硬件的socket.io
消息推送,就会使用kafka转发消息 之前报错原因是项目启动时,在还未与kafka建立连接时,程序就接收到socket消息开始按照程序逻辑发送kafka消息,导致空指针异常,之后加了判断与初始化处理,问题解决 本次现象类似,但初始化部署,程序没有连接硬件时,启动都是成功的。但程序已经使用一段时间,连接上硬件,接收硬件上报数据,就出现了偶发的更新启动卡死现象
排查和解决
先是对比了启动失败和启动成功的日志,没有发现什么异常报错,启动成功日志如下
2023-01-16 10:13:32.705 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : vehGPSTrajectoryHandler bean listeners added
2023-01-16 10:13:32.709 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : wirelessOpLogHandler bean listeners added
2023-01-16 10:13:32.715 INFO 1 --- [ restartedMain] c.n.s.a.SpringAnnotationScanner : wirelessSecurityHandler bean listeners added
2023-01-16 10:13:33.786 INFO 1 --- [ntLoopGroup-3-1] c.n.socketio.handler.AuthorizeHandler : channel active 65c277be-ed56-496a-9a3f-7243b1e7adff
2023-01-16 10:13:33.786 INFO 1 --- [ntLoopGroup-3-2] c.n.socketio.handler.AuthorizeHandler : channel active f9f1acf4-ada2-4590-b949-46e0214e879b
2023-01-16 10:13:34.263 INFO 1 --- [ntLoopGroup-3-1] c.h.i.p.impl.PartitionStateManager : [172.18.0.14]:5701 [dev] [4.2.4] Initializing cluster partition table arrangement...
2023-01-16 10:13:34.583 INFO 1 --- [ntLoopGroup-3-1] c.n.unit.agent.ControllerAgentServer : onConnect -- client = 172.16.1.93 Port=39558 unitId:93
2023-01-16 10:13:34.583 INFO 1 --- [ntLoopGroup-3-2] c.n.unit.agent.ControllerAgentServer : onConnect -- client = 172.16.1.107 Port=45152 unitId:107
2023-01-16 10:13:37.960 INFO 1 --- [ Timer-0] c.n.unit.agent.ControllerAgentServer : Sync signal time ipAddress:172.16.1.93
2023-01-16 10:13:37.960 INFO 1 --- [ Timer-1] c.n.unit.agent.ControllerAgentServer : Sync signal time ipAddress:172.16.1.107
2023-01-16 10:13:38.637 INFO 1 --- [ntLoopGroup-3-3] c.n.socketio.handler.AuthorizeHandler : channel active 6dce6405-53af-4914-aa6c-dcbf2860babc
2023-01-16 10:13:38.660 INFO 1 --- [ntLoopGroup-3-3] c.n.unit.agent.ControllerAgentServer : onConnect -- client = 172.16.1.101 Port=38154 unitId:101
2023-01-16 10:13:41.930 INFO 1 --- [ Timer-4] c.n.unit.agent.ControllerAgentServer : Sync signal time ipAddress:172.16.1.101
2023-01-16 10:13:42.544 WARN 1 --- [ restartedMain] onConfiguration$FunctionBindingRegistrar : You have defined function definition that does not exist: streamBridge
2023-01-16 10:13:43.942 INFO 1 --- [ntLoopGroup-3-1] c.n.socketio.protocol.PacketDecoder : Rcv Signal Machine IP:172.16.1.93
2023-01-16 10:13:43.942 INFO 1 --- [ntLoopGroup-3-2] c.n.socketio.protocol.PacketDecoder : Rcv Signal Machine IP:172.16.1.107
2023-01-16 10:13:43.942 INFO 1 --- [ntLoopGroup-3-3] c.n.socketio.protocol.PacketDecoder : Rcv Signal Machine IP:172.16.1.101
2023-01-16 10:13:43.944 INFO 1 --- [ntLoopGroup-3-3] c.n.u.a.handler.SignalIPParamHandler : rcv:172.16.1.101
2023-01-16 10:13:43.945 INFO 1 --- [ntLoopGroup-3-2] c.n.u.a.handler.SignalIPParamHandler : rcv:172.16.1.107
2023-01-16 10:13:43.945 INFO 1 --- [ntLoopGroup-3-1] c.n.u.a.handler.SignalIPParamHandler : rcv:172.16.1.93
2023-01-16 10:13:47.039 INFO 1 --- [ restartedMain] org.jboss.threads : JBoss Threads version 3.1.0.Final
2023-01-16 10:13:47.304 INFO 1 --- [ restartedMain] com.newatc.unit.UnitApp : Started UnitApp in 41.426 seconds (JVM running for 44.067)
2023-01-16 10:13:47.319 INFO 1 --- [ restartedMain] com.newatc.unit.UnitApp :
----------------------------------------------------------Application 'unit' is running! Access URLs:Local: http://localhost:8282/External: http://172.18.0.14:8282/当前版本为: 1.4.2代码打包时间为: 202301060900Profile(s): [prod, api-docs, no-liquibase]
----------------------------------------------------------
2023-01-16 10:13:47.320 INFO 1 --- [ restartedMain] com.newatc.unit.UnitApp :
----------------------------------------------------------Config Server: Connected to Consul Server running in Docker
----------------------------------------------------------
这个启动卡死现象是偶发的,确定与kafka发送消息有关,但不是必现,无法通过错误日志和日志对比找问题,只好另寻他法 由于启动成功的日志里,也有接收硬件消息的日志打印,想着是不是有线程等待,暂时卡主了,需要使用一些自带的java命令来检查下 在docker里使用 jstack 1
,结果提示bash: jstack: command not found
,去java文件夹看了下,确实没这些命令 为了减少镜像包大小,我们使用的是jre版本的java 11 环境docker镜像eclipse-temurin:11-jre-focal
(程序镜像包170M左右),没有自带一些java命令,只好换成jdk版本的eclipse-temurin:11-jdk-focal
重新打镜像包(300M左右)部署 部署后重现了启动卡死的问题,使用 jstack 1
命令查看此时的线程情况,一下子发现了问题Found 1 deadlock.
,发现了一个死锁线程,具体日志如下(省略了部分日志):
I have no name!@315764d2cd4b:/$ jstack 1
2023-01-16 11:39:57
Full thread dump OpenJDK 64-Bit Server VM (11.0.17+8 mixed mode, sharing):JNI global refs: 29, weak refs: 0Found one Java-level deadlock:
=============================
"restartedMain":waiting to lock monitor 0x00007f730406bb00 (object 0x00000000e34c2160, a org.springframework.cloud.stream.binder.DefaultBinderFactory),which is held by "nioEventLoopGroup-3-2"
"nioEventLoopGroup-3-2":waiting to lock monitor 0x00007f725c019500 (object 0x00000000e1516e80, a java.util.concurrent.ConcurrentHashMap),which is held by "restartedMain"Java stack information for the threads listed above:
===================================================
"restartedMain":at org.springframework.cloud.stream.binder.DefaultBinderFactory.getBinder(DefaultBinderFactory.java:129)- waiting to lock <0x00000000e34c2160> (a org.springframework.cloud.stream.binder.DefaultBinderFactory)at org.springframework.cloud.stream.function.FunctionConfiguration$FunctionToDestinationBinder.determineTargetProtocol(FunctionConfiguration.java:640)at org.springframework.cloud.stream.function.FunctionConfiguration$FunctionToDestinationBinder.createFunctionHandler(FunctionConfiguration.java:579)at org.springframework.cloud.stream.function.FunctionConfiguration$FunctionToDestinationBinder.bindFunctionToDestinations(FunctionConfiguration.java:562)at org.springframework.cloud.stream.function.FunctionConfiguration$FunctionToDestinationBinder.afterPropertiesSet(FunctionConfiguration.java:421)at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1863)at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1800)at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620)at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)at org.springframework.beans.factory.support.AbstractBeanFactory$$Lambda$287/0x00000008401f6040.getObject(Unknown Source)at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)- locked <0x00000000e1516e80> (a java.util.concurrent.ConcurrentHashMap)at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:953)at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918)at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583)- locked <0x00000000e10d9e48> (a java.lang.Object)at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:145)at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:740)at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:415)at org.springframework.boot.SpringApplication.run(SpringApplication.java:303)at com.newatc.unit.UnitApp.main(UnitApp.java:72)at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.17/Native Method)at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.17/NativeMethodAccessorImpl.java:62)at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.17/DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(java.base@11.0.17/Method.java:566)at org.springframework.boot.devtools.restart.RestartLauncher.run(RestartLauncher.java:49)
"nioEventLoopGroup-3-2":at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:216)- waiting to lock <0x00000000e1516e80> (a java.util.concurrent.ConcurrentHashMap)at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:213)at org.springframework.context.event.AbstractApplicationEventMulticaster.retrieveApplicationListeners(AbstractApplicationEventMulticaster.java:264)at org.springframework.context.event.AbstractApplicationEventMulticaster.getApplicationListeners(AbstractApplicationEventMulticaster.java:221)at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:138)at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:421)at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:427)at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:378)at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:938)at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:586)- locked <0x00000000e3d686a0> (a java.lang.Object)at org.springframework.cloud.stream.binder.DefaultBinderFactory.initializeBinderContextSimple(DefaultBinderFactory.java:471)at org.springframework.cloud.stream.binder.DefaultBinderFactory.getBinderInstance(DefaultBinderFactory.java:265)at org.springframework.cloud.stream.binder.DefaultBinderFactory.doGetBinder(DefaultBinderFactory.java:223)at org.springframework.cloud.stream.binder.DefaultBinderFactory.getBinder(DefaultBinderFactory.java:151)- locked <0x00000000e34c2160> (a org.springframework.cloud.stream.binder.DefaultBinderFactory)at org.springframework.cloud.stream.binding.BindingService.getBinder(BindingService.java:389)at org.springframework.cloud.stream.binding.BindingService.bindProducer(BindingService.java:273)at org.springframework.cloud.stream.function.StreamBridge.resolveDestination(StreamBridge.java:282)- locked <0x00000000e3453128> (a org.springframework.cloud.stream.function.StreamBridge)at org.springframework.cloud.stream.function.StreamBridge.send(StreamBridge.java:210)at org.springframework.cloud.stream.function.StreamBridge.send(StreamBridge.java:164)at org.springframework.cloud.stream.function.StreamBridge.send(StreamBridge.java:144)at com.newatc.unit.agent.ControllerAgentServer.onConnect(ControllerAgentServer.java:184)at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.17/Native Method)at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.17/NativeMethodAccessorImpl.java:62)at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.17/DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(java.base@11.0.17/Method.java:566)at com.newatc.socketio.annotation.OnConnectScanner$1.onConnect(OnConnectScanner.java:41)at com.newatc.socketio.namespace.Namespace.onConnect(Namespace.java:186)at com.newatc.socketio.handler.AuthorizeHandler.connect(AuthorizeHandler.java:151)at com.newatc.socketio.handler.AuthorizeHandler.channelActive(AuthorizeHandler.java:90)at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:209)at io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398)at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)at io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895)at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:522)at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)at java.lang.Thread.run(java.base@11.0.17/Thread.java:829)Found 1 deadlock.
看了stack日志,看到了死锁互相等待的线程,就是这个问题了。具体触发死锁的代码为at org.springframework.cloud.stream.function.StreamBridge.send(StreamBridge.java:210)
,还是kafka问题 发现问题,想着等启动完成后,再去执行发送kafka消息的逻辑。再启动类里增加一个公共变量,启动前为false,启动完成后为true,在kafka发送时,使用此变量判断
public static Boolean START_COMPLETED = false ; public static void main ( String [ ] args) { SpringApplication app = new SpringApplication ( UnitApp . class ) ; DefaultProfileUtil . addDefaultProfile ( app) ; Environment env = app. run ( args) . getEnvironment ( ) ; logApplicationStartup ( env) ; START_COMPLETED = true ; } try { if ( streamBridge != null && UnitApp . START_COMPLETED ) { streamBridge. afterSingletonsInstantiated ( ) ; streamBridge. send ( OUTPUT_BINDING_NAME , mapper. writeValueAsString ( offlineEventDTO) ) ; } else { log. error ( "streamBridge is null or UnitApp.START_COMPLETED is {}--------!" , UnitApp . START_COMPLETED ) ; } } catch ( JsonProcessingException e) { log. error ( e. getMessage ( ) ) ; }