k8s(三)、kube-router: K8S BGP直通网络CNI方案

news/2024/11/7 21:09:37/

前言

上一篇讲到,k8s使用traefik代理集群内部服务,灵活地注入代理配置的方式,提供7层对外服务(参考:k8s(二)、对外服务)。在本篇,使用kube-router的cni,实现3层网络直通式的集群内外网络互通。

环境

环境测试使用GNS3-IOU思科模拟器+vmware虚拟机,搭建BGP网络环境。主机(client端)与 虚拟机/路由器 通过vmware桥接的方式互通,而后配置路由器与k8s节点内的kube-router cni,使两者建立起bgp邻居关系,宣告集群内的pod CIDR网段,BGP网络宣告完成后,外部client即可直接通过3层网络通信的方式访问集群内的pod,最终实现集群内部与外部的BGP网络互通。

# 客户端
client: 192.168.0.158/24 # 物理机# 中转端(router)
AS number: 64513
router1: 192.168.0.101/24  # gns3+vmware虚机环境
router2: 192.168.0.102/24 # gns3+vmware虚机环境
router vip: 192.168.0.100  # 两个router之间的vrrp虚拟ip# k8s集群端
AS number: 64512
node1: 192.168.0.170/24  # vmware虚机环境
node1: 192.168.0.171/24  # vmware虚机环境
pod CIDR: 10.244.0.0/16 
svc CIDR:  10.96.0.0/12

目标实验效果: 集群外的客户端访问集群内的pod和svc,数据层面通过如下流向:client ---> router --> k8s node --> pod ip/ svc ip,实现3层直通。

拓扑图:

这里写图片描述

路由器配置

####router1:
IOU1#show running-config interface Ethernet0/0ip address 192.168.0.101 255.255.255.0vrrp 1 ip 192.168.0.100router bgp 64513bgp log-neighbor-changesneighbor 192.168.0.102 remote-as 64513neighbor 192.168.0.170 remote-as 64512neighbor 192.168.0.171 remote-as 64512maximum-paths 2no auto-summary###router2:
IOU2#show running-config 
interface Ethernet0/0ip address 192.168.0.102 255.255.255.0vrrp 1 ip 192.168.0.100vrrp 1 priority 100router bgp 64513bgp log-neighbor-changesneighbor 192.168.0.101 remote-as 64513neighbor 192.168.0.170 remote-as 64512neighbor 192.168.0.171 remote-as 64512maximum-paths 2no auto-summary

kube-router配置:

k8s集群搭建步骤见k8s(一)、 1.9.0高可用集群本地离线部署记录
在此前配置好的kube-route的yaml配置文件中,需添加如下几项参数,分别指定本地bgp as,对等体as、ip等。

 - --advertise-cluster-ip=true   #宣告集群IP- --advertise-external-ip=true  #宣告svc外部ip,如果svc指定了external-ip则生效- --cluster-asn=64512           #指定本地集群bgp as号- --peer-router-ips=192.168.0.100 #指定对等体ip,这里可以写多个,以','隔开,本次实验路由器配置了vrrp,指定vip即可- --peer-router-asns=64513  #对等体as号

完整的kube-router.yaml文件如下:

apiVersion: v1
kind: ConfigMap
metadata:name: kube-router-cfgnamespace: kube-systemlabels:tier: nodek8s-app: kube-router
data:cni-conf.json: |{"name":"kubernetes","type":"bridge","bridge":"kube-bridge","isDefaultGateway":true,"ipam": {"type":"host-local"}}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:labels:k8s-app: kube-routertier: nodename: kube-routernamespace: kube-system
spec:template:metadata:labels:k8s-app: kube-routertier: nodeannotations:scheduler.alpha.kubernetes.io/critical-pod: ''spec:serviceAccountName: kube-routerserviceAccount: kube-routercontainers:- name: kube-routerimage: cloudnativelabs/kube-routerimagePullPolicy: Alwaysargs:- --run-router=true- --run-firewall=true- --run-service-proxy=true- --advertise-cluster-ip=true- --advertise-external-ip=true- --cluster-asn=64512- --peer-router-ips=192.168.0.100- --peer-router-asns=64513env:- name: NODE_NAMEvalueFrom:fieldRef:fieldPath: spec.nodeNamelivenessProbe:httpGet:path: /healthzport: 20244initialDelaySeconds: 10periodSeconds: 3resources:requests:cpu: 250mmemory: 250MisecurityContext:privileged: truevolumeMounts:- name: lib-modulesmountPath: /lib/modulesreadOnly: true- name: cni-conf-dirmountPath: /etc/cni/net.d- name: kubeconfigmountPath: /var/lib/kube-router/kubeconfigreadOnly: trueinitContainers:- name: install-cniimage: busyboximagePullPolicy: Alwayscommand:- /bin/sh- -c- set -e -x;if [ ! -f /etc/cni/net.d/10-kuberouter.conf ]; thenTMP=/etc/cni/net.d/.tmp-kuberouter-cfg;cp /etc/kube-router/cni-conf.json ${TMP};mv ${TMP} /etc/cni/net.d/10-kuberouter.conf;fivolumeMounts:- mountPath: /etc/cni/net.dname: cni-conf-dir- mountPath: /etc/kube-routername: kube-router-cfghostNetwork: truetolerations:- key: CriticalAddonsOnlyoperator: Exists- effect: NoSchedulekey: node-role.kubernetes.io/masteroperator: Existsvolumes:- name: lib-moduleshostPath:path: /lib/modules- name: cni-conf-dirhostPath:path: /etc/cni/net.d- name: kube-router-cfgconfigMap:name: kube-router-cfg- name: kubeconfighostPath:path: /var/lib/kube-router/kubeconfig
---
apiVersion: v1
kind: ServiceAccount
metadata:name: kube-routernamespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:name: kube-routernamespace: kube-system
rules:- apiGroups:- ""resources:- namespaces- pods- services- nodes- endpointsverbs:- list- get- watch- apiGroups:- "networking.k8s.io"resources:- networkpoliciesverbs:- list- get- watch- apiGroups:- extensionsresources:- networkpoliciesverbs:- get- list- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:name: kube-router
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: kube-router
subjects:
- kind: ServiceAccountname: kube-routernamespace: kube-system

直接删除此前的kube-router配置,重新create这个yaml文件内的资源:

kubectl delete -f kubeadm-kuberouter.yaml  #更新配置参数前执行
kubectl create -f  kubeadm-kuberouter.yaml  #更新配置参数后执行

接着部署一个测试用得nginx的pod实例,yaml文件如下

[root@171 nginx]# cat nginx-deploy.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:annotations:deployment.kubernetes.io/revision: "2"creationTimestamp: 2018-04-09T04:02:02Zgeneration: 4labels:app: nginxname: nginx-deploynamespace: defaultresourceVersion: "111504"selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/nginx-deployuid: c28090c0-3baa-11e8-b75a-000c29858eab
spec:replicas: 1selector:matchLabels:app: nginxstrategy:rollingUpdate:maxSurge: 1maxUnavailable: 1type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: nginxspec:containers:- image: nginx:1.9.1name: nginxports:- containerPort: 80protocol: TCPresources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilednsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30

查看podIP,本地curl测试:

[root@171 nginx]# kubectl get pods -o wide 
NAME                            READY     STATUS    RESTARTS   AGE       IP            NODE
nginx-deploy-5964dfd755-lv2kb   1/1       Running   0          29m       10.244.1.35   171
[root@171 nginx]# curl 10.244.1.35
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>body {width: 35em;margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif;}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p><p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p><p><em>Thank you for using nginx.</em></p>
</body>
</html>
[root@171 nginx]# 

3.检验结果

路由器查看BGP邻居:

IOU1#show ip bgp summary 
BGP router identifier 192.168.0.101, local AS number 64513
BGP table version is 19, main routing table version 19
4 network entries using 560 bytes of memory
6 path entries using 480 bytes of memory
2 multipath network entries and 4 multipath paths
1/1 BGP path/bestpath attribute entries using 144 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 1208 total bytes of memory
BGP activity 5/1 prefixes, 12/6 paths, scan interval 60 secsNeighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.0.102   4        64513     100     106       19    0    0 01:27:20        0
192.168.0.170   4        64512     107      86       19    0    0 00:37:19        3
192.168.0.171   4        64512      98      85       19    0    0 00:37:21        3
IOU1#

路由器查看BGP路由条目:

IOU1#show ip route bgp
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGPD - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2E1 - OSPF external type 1, E2 - OSPF external type 2i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2ia - IS-IS inter area, * - candidate default, U - per-user static routeo - ODR, P - periodic downloaded static route, H - NHRP, l - LISPa - application route+ - replicated route, % - next hop overrideGateway of last resort is 192.168.0.1 to network 0.0.0.010.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
B        10.96.0.1/32 [20/0] via 192.168.0.171, 00:38:36[20/0] via 192.168.0.170, 00:38:36
B        10.96.0.10/32 [20/0] via 192.168.0.171, 00:38:36[20/0] via 192.168.0.170, 00:38:36
B        10.244.0.0/24 [20/0] via 192.168.0.170, 00:38:36
B        10.244.1.0/24 [20/0] via 192.168.0.171, 00:38:39

可以看到,bgp邻居建立成功,k8s集群内部路由学习成功

开一台测试机检查:
测试机修改默认网关指向路由器,模拟外部网络环境中的一台普通pc:

[root@python ~]# ip route
default via 192.168.0.100 dev eth3  proto static [root@python ~]# curl 10.244.1.35
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>body {width: 35em;margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif;}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p><p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p><p><em>Thank you for using nginx.</em></p>
</body>
</html>
[root@python ~]# 

测试成功,至此,集群内部bgp网络,和外部bgp网络对接成功!

友情提示:别忘了保存路由器的配置到nvram里(copy running-config startup-config),否则重启就丢配置啦。好久没碰网络设备了,这个茬给忘了,被坑了一次,嘿嘿


9-22踩坑

今日尝试在k8s集群中添加与原集群node(192.168.9.x/24)不在同一个网段的新node(192.168.20.x/24),创建好了之后出现了非常奇怪的现象:新node中的kube-router与集群外的网络核心设备之间建立的peer邻居关系一直重复地处于established —> idle状态频繁转变,非常的不稳定,当处于established状态时,新node与原node间的丢包率甚至达到70%以上,处于idle状态时,node间ping包正常
问题截图:
在这里插入图片描述

在进入kube-router后使用gobgp neighbor查看发现新node与外部网络设备ebgp\原node的ibgp邻居关系一直处于频繁变化的状态无法稳定建立关系,百思不得其解,最后终于在github上找到了kube-router唯一的类似issue,项目成员解释如下:
Github issue链接
在这里插入图片描述

个人理解是:
kube-router只支持与同一个子网的node建立ibgp对等体关系,跨子网的节点无法建立对等体邻居关系;同时,ebgp协议的ebgp max hop属性,默认值为1,路由器设备一般支持手动修改此值,而kube-router较早之前的版本仅支持默认值1,无法手动配置此值,因此EBGP邻居与kube-router也必须在同一个子网中,后面的版本已解决此问题,升级版本后node跨子网建ebgp邻居不再有问题。


http://www.ppmy.cn/news/180424.html

相关文章

RISC-V (CH32V103)与Cortex-M3内核(CH32F103)单片机中断延时测评

最近在学习RISC-V架构的MCU&#xff0c;特地学习了一下“中断处理机制”&#xff0c;对比之前使用过的Cortex-M3内核单片机&#xff0c;研究它们在中断执行和处理上的差异和效率。 样品选择 CH32V103C8T6和CH32F103C8T6 CH32V103C8T6&#xff0c;沁恒微电子的RSIC-V架构32位…

彩色MT9V034摄像头 Bayer转rgb FPGA实现

彩色MT9V034摄像头 Bayer转rgb FPGA实现 1 图像bayer格式介绍 bayer格式是伊士曼柯达公司科学家Bryce Bayer发明的&#xff0c;Bryce Bayer所发明的拜耳阵列被广泛运用数字图像。Bayer格式是相机内部的原始数据, 一般后缀名为.raw。 对于彩色图像,一般是三原色数据&#xff0c…

AIC, BIC v.s. Crossvalidation

模型选择的方法有许多&#xff0c;例如 AIC, BIC, Crossvalidation, GCV等等。这 些准则的适用条件有哪些&#xff1f;该如何选择不同的准则&#xff1f; AIC, BIC准则 ref&#xff1a;https://methodology.psu.edu/node/504 ref&#xff1a;https://en.wikipedia.org/wiki/B…

v3S编译大全(uboot 主线linux buildroot)

文章目录 一、github代理下载二、Uboot1.uboot下载2.uboot的基本结构3.uboot配置屏幕4.修改可以从tf卡启动5.配置6.编译6.烧录测试1.进入 fel 模式使用 sunxi-fel 工具烧录1.基本命令2.有两种方式进行程序下载&#xff1a; 2.烧录到TF/SD卡 7.uboot启动信息 三、主线linxu编译与…

RISC-V MCU应用之单片赤菟V307实现八串口服务器

简介 串口服务器&#xff0c;提供串口转网络的功能&#xff0c;将串口转换成TCP/IP协议网络接口&#xff0c;实现串口和网络接口的数据双向透明传输&#xff0c;扩展串口设备的通讯距离。 赤菟V307是搭载沁恒自研RISC-V内核青稞V4F的高性能互联型MCU&#xff0c;主频支持144M…

6、Lctech Pi(F1C200S)4.3寸(480*272)16位RGB565LCD屏驱动适配(CherryPi,Mangopi,F1C100S)

本次主要参考&#xff1a; https://blog.csdn.net/wending1986/article/details/106837597/ http://blog.chinaunix.net/uid-20543672-id-3246283.html https://wiki.sipeed.com/soft/Lichee/zh/Nano-Doc-Backup/index.html 李山文大佬 &#xff08;如果方便请给这几位大佬一个…

嵌入式Linux--荔枝派Zero--V3s--ST7789v

文章目录 前言血的教训&#xff1a;没有CS引脚是没法运行的&#xff01; 一、原理图二、ubootU-Boot编译配置文件编译uboot 三、Linux Kernel四、设备树第一步&#xff1a;注释掉自带的视频输出第二步&#xff1a;增加屏幕驱动节点 五、烧录荔枝派是怎么启动的TF卡上需要有什么…

全志v3s学习笔记(5)——主线Linux编译与烧录

一、安装交叉编译环境 交叉编译环境跟uboot使用的一样。 参考&#xff1a;arm-linux-gnueabihf 交叉编译工具链安装 二、下载linux源码 # 默认是zero-4.10.y分支&#xff1a; git clone https://github.com/Lichee-Pi/linux.git# 或者zero-4.13.y对网卡的支持更好 git clone …