calico-kube-controllers 启动失败处理

news/2024/11/16 9:55:15/

故障描述

calico-kube-controllers 异常,不断重启

日志信息如下

2023-02-21 01:26:47.085 [INFO][1] main.go 92: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0221 01:26:47.086980       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2023-02-21 01:26:47.087 [INFO][1] main.go 113: Ensuring Calico datastore is initialized
2023-02-21 01:26:47.106 [INFO][1] main.go 153: Getting initial config snapshot from datastore
2023-02-21 01:26:47.120 [INFO][1] main.go 156: Got initial config snapshot
2023-02-21 01:26:47.120 [INFO][1] watchersyncer.go 89: Start called
2023-02-21 01:26:47.120 [INFO][1] main.go 173: Starting status report routine
2023-02-21 01:26:47.120 [INFO][1] main.go 182: Starting Prometheus metrics server on port 9094
2023-02-21 01:26:47.120 [INFO][1] main.go 418: Starting controller ControllerType="Node"
2023-02-21 01:26:47.120 [INFO][1] watchersyncer.go 127: Sending status update Status=wait-for-ready
2023-02-21 01:26:47.120 [INFO][1] node_syncer.go 65: Node controller syncer status updated: wait-for-ready
2023-02-21 01:26:47.120 [INFO][1] watchersyncer.go 147: Starting main event processing loop
2023-02-21 01:26:47.120 [INFO][1] watchercache.go 174: Full resync is required ListRoot="/calico/ipam/v2/assignment/"
2023-02-21 01:26:47.120 [INFO][1] node_controller.go 143: Starting Node controller
2023-02-21 01:26:47.121 [INFO][1] watchercache.go 174: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2023-02-21 01:26:47.121 [INFO][1] resources.go 349: Main client watcher loop
2023-02-21 01:26:47.121 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.121 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.121 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.121 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.124 [INFO][1] watchercache.go 271: Sending synced update ListRoot="/calico/ipam/v2/assignment/"
2023-02-21 01:26:47.125 [INFO][1] watchersyncer.go 127: Sending status update Status=resync
2023-02-21 01:26:47.125 [INFO][1] node_syncer.go 65: Node controller syncer status updated: resync
2023-02-21 01:26:47.125 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2023-02-21 01:26:47.125 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.125 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.129 [INFO][1] watchercache.go 271: Sending synced update ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2023-02-21 01:26:47.129 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.129 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.129 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2023-02-21 01:26:47.129 [INFO][1] watchersyncer.go 221: All watchers have sync'd data - sending data and final sync
2023-02-21 01:26:47.129 [INFO][1] watchersyncer.go 127: Sending status update Status=in-sync
2023-02-21 01:26:47.129 [INFO][1] node_syncer.go 65: Node controller syncer status updated: in-sync
2023-02-21 01:26:47.137 [INFO][1] hostendpoints.go 90: successfully synced all hostendpoints
2023-02-21 01:26:47.221 [INFO][1] node_controller.go 159: Node controller is now running
2023-02-21 01:26:47.226 [INFO][1] ipam.go 69: Synchronizing IPAM data
2023-02-21 01:26:47.236 [INFO][1] ipam.go 78: Node and IPAM data is in sync

定位问题在这里

Failed to write status error=open /status/status.json: permission denied

进入容器检查目录

尝试进入容器,但是该容器居然没 cat , ls 等常规命令,无法查看容器问题

检查配置

查看pod的配置,对比其它集群,没任何问题,一样的

[grg@i-A8259010 ~]$ kubectl describe pod calico-kube-controllers-9f49b98f6-njs2f -n kube-system
Name:                 calico-kube-controllers-9f49b98f6-njs2f
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 10.254.39.2/10.254.39.2
Start Time:           Thu, 16 Feb 2023 11:14:35 +0800
Labels:               k8s-app=calico-kube-controllerspod-template-hash=9f49b98f6
Annotations:          cni.projectcalico.org/podIP: 10.244.29.73/32cni.projectcalico.org/podIPs: 10.244.29.73/32
Status:               Running
IP:                   10.244.29.73
IPs:IP:           10.244.29.73
Controlled By:  ReplicaSet/calico-kube-controllers-9f49b98f6
Containers:calico-kube-controllers:Container ID:   docker://21594e3517a3fc8ffc5224496cec373117138acf5417d9a335a1c5e80e0c3802Image:          registry.custom.local:12480/kubeadm-ha/calico_kube-controllers:v3.19.1Image ID:       docker-pullable://registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers@sha256:2ff71ba65cd7fe10e183ad80725ad3eafb59899d6f1b2610446b90c84bf2425aPort:           <none>Host Port:      <none>State:          WaitingReason:       CrashLoopBackOffLast State:     TerminatedReason:       ErrorExit Code:    2Started:      Tue, 21 Feb 2023 09:34:06 +0800Finished:     Tue, 21 Feb 2023 09:35:15 +0800Ready:          FalseRestart Count:  1940Liveness:       exec [/usr/bin/check-status -l] delay=10s timeout=1s period=10s #success=1 #failure=6Readiness:      exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3Environment:ENABLED_CONTROLLERS:  nodeDATASTORE_TYPE:       kubernetesMounts:/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-55jbn (ro)
Conditions:Type              StatusInitialized       TrueReady             FalseContainersReady   FalsePodScheduled      True
Volumes:kube-api-access-55jbn:Type:                    Projected (a volume that contains injected data from multiple sources)TokenExpirationSeconds:  3607ConfigMapName:           kube-root-ca.crtConfigMapOptional:       <nil>DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Existsnode-role.kubernetes.io/master:NoSchedulenode.kubernetes.io/not-ready:NoExecute op=Exists for 300snode.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:Type     Reason     Age                        From     Message----     ------     ----                       ----     -------Warning  Unhealthy  31m (x15164 over 4d22h)    kubelet  Readiness probe failed: Failed to read status file /status/status.json: unexpected end of JSON inputWarning  BackOff    6m23s (x23547 over 4d22h)  kubelet  Back-off restarting failed containerWarning  Unhealthy  79s (x11571 over 4d22h)    kubelet  Liveness probe failed: Failed to read status file /status/status.json: unexpected end of JSON input

对比镜像

检查镜像版本,与其它集群一致,没问题

Image:          registry.custom.local:12480/kubeadm-ha/calico_kube-controllers:v3.19.1    
Image ID:       docker-pullable://registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers@sha256:2ff71ba65cd7fe10e183ad80725ad3eafb59899d6f1b2610446b90c84bf2425a

检查其余集群配置差异

检查与其它集群的配置信息,该机器的 docker 是原来已经安装的,版本是 19,其它机器是新安装的版本 20 。

处理方案

在无法重装 docker 的情况下

重启 pod,无效

百度,无相关信息

调整 calico-kube-controllers 配置

配置文件在 /etc/kubernetes/plugins/network-plugin/calico-typha.yaml

我们针对无法写入目录 /status ,添加卷映射

应用配置

mkdir /var/run/calico/status
chmod 777/var/run/calico/status
kubectl apply -f  /etc/kubernetes/plugins/network-plugin/calico-typha.yaml

到此系统恢复


http://www.ppmy.cn/news/26689.html

相关文章

CAD正式学习(一)

CAD正式学习&#xff08;一&#xff09;&#xff08;23.2.20&#xff09; CAD简介 CAD是Autodesk&#xff08;欧特克&#xff09;公司首次于1982年开发的自动计算机辅助软件&#xff0c;主要用于二维绘图、详细绘制、设计文档和基本三维设计&#xff0c;是广为流行的绘图工具…

Web前端:什么是Vue Native 框架?有什么特点?

Vue Native是一个使用Vue.Js开发本地移动应用程序的框架。该框架将文档转换为React Native&#xff0c;进而为你提供适用于Android和iOS的本地应用程序。实际上&#xff0c;Vue Native应用程序据说是React API的包装。Vue将Vue.js和React结合在一起&#xff0c;让你的开发团队充…

学习OpenGL图形2D/3D编程

环境:Windows+Visual Studio 2019 最流行的几个库:GLUT,SDL,SFML和GLFW

我的创作纪念日

机缘 不敢相信已经整整四年了&#xff0c;从以前刚上大学的青涩&#xff0c;到现在对计算机和编程有了一些了解&#xff0c;真是感慨&#xff0c;时间流逝&#xff0c;这几年发生了很多&#xff0c;人也渐渐成长。哎&#xff0c;人生能有几个四年&#xff1f; 收获 提示&…

小程序 npm sill idealTree buildDeps 安装一直没反应

目录 一、问题 二、解决 1、删除.npmsrc 、清除缓存 2、更换镜像源 3、最终检测 一、问题 记录&#xff1a;今天npm 一直安装不成功 显示&#xff1a;sill idealTree buildDeps 我的版本&#xff1a; 我百度到换镜像源安装方法&#xff0c;但我尝试后&#xff0c;依然…

<JVM上篇:内存与垃圾回收篇>11 - 垃圾回收相关算法

对象存活判断 在堆里存放着几乎所有的 Java 对象实例&#xff0c;在 GC 执行垃圾回收之前&#xff0c;首先需要区分出内存中哪些是存活对象&#xff0c;哪些是已经死亡的对象。只有被标记为己经死亡的对象&#xff0c;GC 才会在执行垃圾回收时&#xff0c;释放掉其所占用的内存…

VBA提高篇_27 OptionBOX_CheckBox_Frame_Image_VBA附加控件

文章目录1.单选按钮OptionBOX:2.复选框CheckBox:3.框架Frame:4.图像Image: (loadPictrue)5. VBA附加控件:6. 适用于很多控件的重要属性:1.单选按钮OptionBOX: 默认时,同一窗体的所有单选按钮均属于同一组,只能选中一个 可通过Frame控件进行分组解决. 2.复选框CheckBox: 一次可以…

自己手写一个redux

提起 Redux 我们想到最多的应该就是 React-redux 这个库&#xff0c;可是实际上 Redux 和 React-redux 并不是同一个东西, Redux 是一种架构模式&#xff0c;源于 Flux。 React-redux 是 Redux 思想与 React 结合的一种具体实现。 在我们使用 React 的时候&#xff0c;常常会遇…