1 前言
这两天一直在看dpvs这块,从开始安装到启动dpvs,一路上遇到各种各样的坑。总结了一下,以供各位参考。
首先DPVS环境需要网卡能支持DPDK技术,由于本人使用的服务器用的intel的网卡,通过查看官方(http://core.dpdk.org/supported/nics/intel/),intel支持dpdk技术的网卡如下所示:
- e1000 (82540, 82545, 82546)
- e1000e (82571, 82572, 82573, 82574, 82583, ICH8, ICH9, ICH10, PCH, PCH2, I217, I218, I219)
- igb (82573, 82576, 82580, I210, I211, I350, I354, DH89xx)
- igc (I225)
- ixgbe (82598, 82599, X520, X540, X550)
- i40e (X710, XL710, X722, XXV710)
- ice (E810)
- fm10k (FM10420)
- ipn3ke (PAC N3000)
- ifc (IFC)
本文服务器的网卡型号是x520类型的网卡,支持dpdk技术。
官方文档:https://github.com/iqiyi/dpvs
可以看到官方文档给出的一个环境是:
本人使用的环境是:
- linux distribution:CentOS Linux release 7.6.1810 (Core)
- kernel:3.10.0-1160.15.2.el7.x86_64
- memory:128G
lspci查看网卡型号:
2 安装步骤
机器配置:
1、机器BIOS配置
- 关闭逻辑核
- cpu模式设置performance
- 内存频率设置最大性能,非auto
2、Kernel配置cpu隔离参数,避免linux内核应用调度到dpvs使用的核心上
[root@3.sys.dpvs.** bin]$ more /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_CMDLINE_LINUX="scsi_mod.scan=sync crashkernel=auto rhgb quiet default_hugepagesz=1G hugepagesz=1G hugepages=80 nowatchdog nmi_watchdog=0 isolcpus=2,4,6,8,10,12,14,16,18 nohz_full=2,4,6,8,10,12,14,16,18 rcu_nocbs=2,4,6,8,10,12,1
4,16,18 rcu_nocb_poll"
GRUB_TERMINAL_OUTPUT="console"
GRUB_DISABLE_RECOVERY="true"
使重启后配置参数仍然生效:
grub2-mkconfig -o /boot/grub2/grub.cfg
cpu隔离操作的相关参考说明可以查略相关资料:
nowatchdog nmi_watchdog=0——关闭watchlog功能
https://www.cnblogs.com/alog9/p/11551441.html———cpu隔离操作
https://blog.csdn.net/weixin_42361608/article/details/116811268?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-17.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-17.control——cpu减少电力消耗
https://blog.csdn.net/haitaoliang/article/details/22427045——isolcpus功能与使用介绍
lscpu查看是否关闭超线程(禁止逻辑核)
正常情况下关闭逻辑核后threads per core是为1的。
3、安装依赖包
# 注意kernel以及相应的kernel组件的版本需要和现在使用的kernel版本相对应$ rpm -qa | grep kernel | grep "3.10.0" | sort
yum group install "Development Tools"
yum install patch libnuma* numactl numactl-devel kernel-devel openssl* popt* gcc* libpcap* -y
//yum install numactl numactl-devel gcc openssl-devel popt-devel patch automake pciutils net-tools libpcap-devel automake libnl libnl-devel libnfnetlink-devel -y
yum install kernel-devel-$(uname -r)
yum install kernel-tools-$(uname -r)
yum install kernel-headers-$(uname -r)
4、重启服务器使内核参数修改生效
reboot
2.1 安装DPVS
$ git clone https://github.com/iqiyi/dpvs.git
$ cd dpvs
这个地方有个坑,git clone https://github.com/iqiyi/dpvs.git不好使,需要改为git clone git://github.com/iqiyi/dpvs.git。
进入dpvs目录下:(现在最新的是dpdk-stable-18.11.2,dpdk-17.11.2.tar.xz安装过程有很多傻b问题,详情可以看https://github.com/iqiyi/dpvs/issues/701)
wget https://fast.dpdk.org/rel/dpdk-18.11.2.tar.xz
tar vxf dpdk-18.11.2.tar.xz
接下来打补丁:(官方意思是不知道补丁干嘛的就都打了吧)
$ cd /<>/dpvs$ cp patch/dpdk-stable-18.11.2/*.patch dpdk-stable-18.11.2/$ cd dpdk-stable-18.11.2/# 0001号补丁主要是用于在kni网卡上开启硬件多播功能,比如在kni设备上启动ospfd$ patch -p1 < 0001-kni-use-netlink-event-for-multicast-driver-part.patch# patching file lib/librte_eal/linuxapp/kni/kni_net.c# 0002号补丁主要是使用dpvs的UOA模块的时候需要用到$ patch -p1 < 0002-net-support-variable-IP-header-len-for-checksum-API.patch# patching file lib/librte_net/rte_ip.h
编译dpdk
$ cd /<>/dpvs/dpdk-stable-18.11.2
$ make config T=x86_64-native-linuxapp-gcc
$ make
$ export RTE_SDK=$PWD
$ export RTE_TARGET=build
需要指定RTE_TARGET,后续DPVS安装时候需要根据其设定值找config文件,否则会报错。
配置hugepage
和其他的一般程序不同,dpvs使用的dpdk并不是从操作系统中索要内存,而是直接使用大页内存(hugepage),极大地提高了内存分配的效率。
$ # for NUMA machine
$ echo 8192 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
$ echo 8192 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
#由于前面已经配置了cpu隔离参数时设置了大页内存,上面这两步可以省略了$ mkdir /mnt/huge
$ mount -t hugetlbfs nodev /mnt/huge
挂载动作可以写入/etc/fstab文件里,进行开机自启挂载。
查看大页内存分配:
cat /proc/meminfo |grep Hug
挂载驱动模块
$ modprobe uio
$ cd /<>/dpvs/dpdk-stable-18.11.2
$ insmod /<>/dpvs/dpdk-stable-18.11.2/build/kmod/igb_uio.ko
$ insmod /<>/dpvs/dpdk-stable-18.11.2/build/kmod/rte_kni.ko carrier=on设置网卡加入dpdk
$ ./usertools/dpdk-devbind.py --status
$ ifconfig eth0 down # assuming eth0 is 0000:06:00.0 #假设使用的是eth0作为dpdk的网卡
$ ./usertools/dpdk-devbind.py -b igb_uio 0000:06:00.0检查是否已经加入dpdk
$ ./usertools/dpdk-devbind.py --status
安装dpvs模块前可能需要对源码文件进行修改
/app/dpvs/src/config.mk
[root@1.ucloud-lan1.dpvs.tt.bjs.p1staff.com src]$ more config.mk
#
# DPVS is a software load balancer (Virtual Server) based on DPDK.
#
# Copyright (C) 2017 iQIYI (www.iqiyi.com).
# All Rights Reserved.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
##
# enable as needed.
#
# TODO: use standard way to define compile flags.
#CONFIG_MLX5=nCFLAGS += -D DPVS_MAX_SOCKET=2
CFLAGS += -D DPVS_MAX_LCORE=64#CFLAGS += -D CONFIG_DPVS_NEIGH_DEBUG
CFLAGS += -D CONFIG_RECORD_BIG_LOOP
#CFLAGS += -D CONFIG_DPVS_SAPOOL_DEBUG
#CFLAGS += -D CONFIG_DPVS_IPVS_DEBUG
#CFLAGS += -D CONFIG_SYNPROXY_DEBUG
#CFLAGS += -D CONFIG_TIMER_MEASURE
#CFLAGS += -D CONFIG_TIMER_DEBUG
#CFLAGS += -D DPVS_CFG_PARSER_DEBUG
#CFLAGS += -D NETIF_BONDING_DEBUG
#CFLAGS += -D CONFIG_TC_DEBUG
#CFLAGS += -D CONFIG_DPVS_IPVS_STATS_DEBUG
#CFLAGS += -D CONFIG_DPVS_IP_HEADER_DEBUG
#CFLAGS += -D CONFIG_DPVS_MBUF_DEBUG
#CFLAGS += -D CONFIG_DPVS_PDUMP
#CFLAGS += -D CONFIG_DPVS_IPSET_DEBUG
#CFLAGS += -D CONFIG_NDISC_DEBUG
#CFLAGS += -D CONFIG_MSG_DEBUG
#CFLAGS += -D CONFIG_DPVS_MP_DEBUGGCC_MAJOR = $(shell echo __GNUC__ | $(CC) -E -x c - | tail -n 1)
GCC_MINOR = $(shell echo __GNUC_MINOR__ | $(CC) -E -x c - | tail -n 1)
GCC_VERSION = $(GCC_MAJOR)$(GCC_MINOR)
dpvs模块安装
$ cd dpdk-stable-18.11.2/
$ export RTE_SDK=$PWD
$ cd <path-of-dpvs>$ make # or "make -j40" to speed up.
$ make install
安装完成后查看文件
$ ls bin/
dpip dpvs ipvsadm keepalived
启动DPVS(这步可以启动,或者后续写进服务里)
$ cp conf/dpvs.conf.single-nic.sample /etc/dpvs.conf
$ ./dpvs &
一般在实际生产环境中,是把dpvs单独跑在指定的内核上运行:(事先在/etc/default/grub文件里定义好isolcpus要隔离的cpu,后续系统启动时其他应用程序会默认调用到其他核上,这个是内核就已经存在了的isolcpus功能,相关说明文档可以参考一下:https://blog.csdn.net/haitaoliang/article/details/22427045)
/dpvs -- --lcores 0@2,1@4,2@6,3@8,4@10,5@12,6@14,7@16,8@18 &
具体的语法以及相关说明信息可以参考dpdk的一些官方文档,https://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html#eal-pthread-and-lcore-affinity。如下:
按文中的描述来看:
A lcore等价于一个线程,对于单线程情况下,线程一般都与cpu进行1:1的绑定,lcore_id等于cpu的id。对于多线程情况下,这种绑定关系就不再是1:1,因此这种情况下可以用户自己指定程序的不同线程(不同的lcore)对应的不同的核,具体规则见上图。
直接启动可能会报错,这是由于设置dpdk的网卡数量和dpvs.conf中配置的不一样,修改dpvs.conf配置即可:
EAL: Error - exiting with code: 1Cause: ports in DPDK RTE (2) != ports in dpvs.conf(1)
修改dpvs.conf配置使得dpdk网卡数量和配置文件配置的一样:
[root@3.sys.dpvs.tt.bjs.p1staff.com ~]$ more /etc/dpvs.conf
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! This is dpvs default configuration file.
!
! The attribute "<init>" denotes the configuration item at initialization stage. Item of
! this type is configured oneshoot and not reloadable. If invalid value configured in the
! file, dpvs would use its default value.
!
! Note that dpvs configuration file supports the following comment type:
! * line comment: using '#" or '!'
! * inline range comment: using '<' and '>', put comment in between
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! global config
global_defs {log_level WARNING! log_file /var/log/dpvs.log! log_async_mode on
}! netif config
netif_defs {<init> pktpool_size 1048575<init> pktpool_cache 256<init> device dpdk0 {rx {queue_number 8descriptor_number 2048rss all}tx {queue_number 8descriptor_number 2048}fdir {mode perfectpballoc 64kstatus matched}! promisc_mode! kni_name dpdk0.kni}<init> device dpdk1 {rx {queue_number 8descriptor_number 2048rss all}tx {queue_number 8descriptor_number 2048}fdir {mode perfectpballoc 64kstatus matched}! promisc_mode! kni_name dpdk0.kni}<init> bonding bond0 {mode 4slave dpdk0slave dpdk1primary dpdk0kni_name bond0.kni}
}! worker config (lcores)
worker_defs {<init> worker cpu0 {type mastercpu_id 0}<init> worker cpu1 {type slavecpu_id 1port bond0 {rx_queue_ids 0tx_queue_ids 0! isol_rx_cpu_ids 9! isol_rxq_ring_sz 1048576}}<init> worker cpu2 {type slavecpu_id 2port bond0 {rx_queue_ids 1tx_queue_ids 1! isol_rx_cpu_ids 10! isol_rxq_ring_sz 1048576}}<init> worker cpu3 {type slavecpu_id 3port bond0 {rx_queue_ids 2tx_queue_ids 2! isol_rx_cpu_ids 11! isol_rxq_ring_sz 1048576}}<init> worker cpu4 {type slavecpu_id 4port bond0 {rx_queue_ids 3tx_queue_ids 3! isol_rx_cpu_ids 12! isol_rxq_ring_sz 1048576}}<init> worker cpu5 {type slavecpu_id 5port bond0 {rx_queue_ids 4tx_queue_ids 4! isol_rx_cpu_ids 13! isol_rxq_ring_sz 1048576}}<init> worker cpu6 {type slavecpu_id 6port bond0 {rx_queue_ids 5tx_queue_ids 5! isol_rx_cpu_ids 14! isol_rxq_ring_sz 1048576}}<init> worker cpu7 {type slavecpu_id 7port bond0 {rx_queue_ids 6tx_queue_ids 6! isol_rx_cpu_ids 15! isol_rxq_ring_sz 1048576}}<init> worker cpu8 {type slavecpu_id 8port bond0 {rx_queue_ids 7tx_queue_ids 7! isol_rx_cpu_ids 16! isol_rxq_ring_sz 1048576}}!<init> worker cpu9 {! type kni! cpu_id 9! port dpdk0 {! tx_queue_ids 8! }!}}! timer config
timer_defs {# cpu job loops to schedule dpdk timer managementschedule_interval 500
}! dpvs neighbor config
neigh_defs {<init> unres_queue_length 128timeout 60
}! dpvs ipv4 config
ipv4_defs {forwarding off<init> default_ttl 64fragment {<init> bucket_number 4096<init> bucket_entries 16<init> max_entries 4096<init> ttl 1}
}! dpvs ipv6 config
ipv6_defs {disable offforwarding offroute6 {<init> method hlistrecycle_time 10}
}! control plane config
ctrl_defs {lcore_msg {<init> ring_size 4096sync_msg_timeout_us 20000priority_level low}ipc_msg {<init> unix_domain /var/run/dpvs_ctrl}
}! ipvs config
ipvs_defs {conn {<init> conn_pool_size 16777216<init> conn_pool_cache 256conn_init_timeout 3! expire_quiescent_template! fast_xmit_close! <init> redirect off}udp {defence_udp_dropuoa_mode oppuoa_max_trail 3timeout {normal 300last 3}}tcp {defence_tcp_droptimeout {none 2established 3600syn_sent 3syn_recv 30fin_wait 7time_wait 7close 3close_wait 7last_ack 7listen 120synack 30last 2}synproxy {synack_options {mss 1452ttl 63sack! wscale! timestamp}! defer_rs_synrs_syn_max_retry 3ack_storm_thresh 10max_ack_saved 3conn_reuse_state {closetime_wait! fin_wait! close_wait! last_ack}}}
}! sa_pool config
sa_pool {pool_hash_size 16
}
安装完成后,由于dpvs.conf文件里设置资源池大小相关参数导致内存不够用,从而导致dpvs一直起不来。后续优化dpvs配置文件参数解决,dpvs启动可以通过以下脚本来完成:
#!/bin/sh
#LIP:10.189.17.5-62
DPVS_HOME=/app/dpvs/bin
LANIP=10.189.17.4
LAN_GW=10.189.17.1
LAN_NET=10.0.0.0/8# startup dpvs
$DPVS_HOME/dpvs -- --lcores 0@2,1@4,2@6,3@8,4@10,5@12,6@14,7@16,8@18 &
sleep 30# if_LAN = bond0
# if_WAN = bond0
############ DPDK ############
# add LAN route for dpvs
$DPVS_HOME/dpip route add ${LANIP}/32 dev bond0 scope kni_host
$DPVS_HOME/dpip route add default via ${LAN_GW} dev bond0
sleep 1############ KNI ############
ip link set bond0.kni up
ip addr add ${LANIP}/26 dev bond0.kni
sleep 1