网卡中的Ring buffer -- 解决 rx_resource_errors 丢包

news/2024/11/9 9:36:24/

1、软硬件环境

        硬件: 飞腾E2000Q 平台

        软件: linux 4.19.246

2、问题现象

网卡在高速收包的过程中,出现 rx error , 细查是 rx_resource_errors  如下:

root@E2000-Ubuntu:~# ifconfig eth1
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500inet 10.100.1.2  netmask 255.255.255.0  broadcast 10.100.1.255inet6 fe80::5ed2:bff:fe13:817d  prefixlen 64  scopeid 0x20<link>ether 5c:d2:0b:13:81:7d  txqueuelen 1000  (Ethernet)RX packets 28043321  bytes 41384388153 (41.3 GB)RX errors 17434  dropped 0  overruns 1305  frame 16129TX packets 26633002  bytes 39782515051 (39.7 GB)TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0device interrupt 93  root@E2000-Ubuntu:~# 
root@E2000-Ubuntu:~# ethtool -S eth1 | grep errortx_carrier_sense_errors: 0rx_frame_check_sequence_errors: 0rx_length_field_frame_errors: 0rx_symbol_errors: 0rx_alignment_errors: 0rx_resource_errors: 16129rx_ip_header_checksum_errors: 0rx_tcp_checksum_errors: 0rx_udp_checksum_errors: 0

问题复现过程如下:

Server端 (问题设备):
ifconfig eth3 192.168.1.11 netmask 255.255.255.0
iperf3 -s  -B 192.168.1.11 -p 10002Client 端 (正常设备):
ifconfig eth3 192.168.1.10 netmask 255.255.255.0
iperf3 -c 192.168.1.11 -p 10002 -t500 -u -b 0 

3、问题分析

rx error 有很多种类,具体ethtool 就列出了这几类,有些是硬件原因,有些是软件可调整的。

rx_frame_check_sequence_errors
rx_length_field_frame_errors
rx_symbol_errors
rx_alignment_errors
rx_resource_errors
rx_ip_header_checksum_errors
rx_tcp_checksum_errors
rx_udp_checksum_errors

MAC在收发包的同时,如果出现有CRC的错包,或者来不及缓存被溢出包,都会被统计到相应的寄存器中,这些数值一般都可以在MAC 的寄存器中读出的,以E2000Q为例

drivers/net/ethernet/phytium/macb.h

 /* GEM register offsets. */
.......
#define GEM_RXUNDRCNT           0x0184 /* Undersize Frames Received Counter */
#define GEM_RXOVRCNT            0x0188 /* Oversize Frames Received Counter */
#define GEM_RXJABCNT            0x018c /* Jabbers Received Counter */
#define GEM_RXFCSCNT            0x0190 /* Frame Check Sequence Error Counter */
#define GEM_RXLENGTHCNT         0x0194 /* Length Field Error Counter */
#define GEM_RXSYMBCNT           0x0198 /* Symbol Error Counter */
#define GEM_RXALIGNCNT          0x019c /* Alignment Error Counter */
#define GEM_RXRESERRCNT         0x01a0 /* Receive Resource Error Counter */
#define GEM_RXORCNT             0x01a4 /* Receive Overrun Counter */
#define GEM_RXIPCCNT            0x01a8 /* IP header Checksum Error Counter */
#define GEM_RXTCPCCNT           0x01ac /* TCP Checksum Error Counter */
#define GEM_RXUDPCCNT           0x01b0 /* UDP Checksum Error Counter */.......

我们通过 ethtools -S eth1 查到我们具体错误的类型 Receive Resource Error,查了一下该寄存器的说明如下:

GEM: Receive Resource Error Counter

the register counting the number of frames that were successfully received by the MAC (correct address matched frame and adequate slot time) but could not be copied to memory because no receive buffer was available. This occurs when the GEM reads a buffer descriptor with its ownership (or used) bit set.

refer: rx_resource_errorshttps://docs.xilinx.com/r/en-US/ug1087-zynq-ultrascale-registers/rx_resource_errors-GEM-Register

看来是收包的时候 receive buffer不足造成的。那该如何调整接收buffer 呢?

看了 iperf3 的有个参数选项 可以调整

-l, --length    #[KMG]    length of buffer to read or write(default 128 KB for TCP, dynamic or 1460 for UDP)

加上参数 iperf3 -c 192.168.1.11 -p 10002 -t500 -u -b 0 -l 65500 同样存在问题, 看来不是应用层receive buffer的问题。

因为网卡在收包的时候,会涉及到多个buffer, 驱动层,应用层的,我们先来研究一下。

Receive ring buffers are shared between the device driver and network interface controller (NIC). The card assigns a transmit (TX) and receive (RX) ring buffer. As the name implies, the ring buffer is a circular buffer where an overflow overwrites existing data. There are two ways to move data from the NIC to the kernel, hardware interrupts and software interrupts, also called SoftIRQs.

The kernel uses the RX ring buffer to store incoming packets until they can be processed by the device driver. The device driver drains the RX ring, typically using SoftIRQs, which puts the incoming packets into a kernel data structure called an sk_buff or skb to begin its journey through the kernel and up to the application which owns the relevant socket.

The kernel uses the TX ring buffer to hold outgoing packets which are destined for the wire. These ring buffers reside at the bottom of the stack and are a crucial point at which packet drop can occur, which in turn will adversely affect network performance.

Increase the size of an Ethernet device’s ring buffers if the packet drop rate causes applications to report a loss of data, timeouts, or other issues.

refer: Chapter 32. Increasing the ring buffers to reduce a high packet drop rate Red Hat Enterprise Linux 9 | Red Hat Customer Portal

上述文章描述的意思大概就是内核会创建两个环形的缓冲区,RX/TX ring buffer ,  RX ring buffer的存在 就是当硬件中断来的时候,内核会先将数据放到一个叫 RX ring buffer的环形缓冲区,然后触发一个软中断,等待网卡驱动去消费 RX ring buffer的数据,因为是环形缓冲区,如果缓冲区太小,而收包的速度很快,就很容易溢出,导致丢包。

 那问题可能就会在这里了。

4 完美解决

那应该如何设置  RX/TX ring buffer的大小呢?其实有两种方法,其中一种是通过ethtool ,一种是通过setsockopt(PACKET_RX_RING/PACKET_TX_RING)设置环形buffer参数。这里选择用ethtool , 首先我们看下我们的网卡支持的最大缓冲区是多少

root@E2000-Ubuntu:~# ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX:		8192
RX Mini:	0
RX Jumbo:	0
TX:		4096
Current hardware settings:
RX:		512
RX Mini:	0
RX Jumbo:	0
TX:		512

Pre-set maximums 中的 RX/TX 值为该网卡的 Buffer size 最大值;
Current hardware settings 中 RX/TX 值代表该网卡当前的 Buffer size 大小。
所以,设置的 Current hardware settings 的 RX/TX 值必须在 Pre-set maximums 的限制之内

ethtool -G eth1 rx 4096 tx 512

设置之后,重新测试,问题完美解决!

 5 拓展

注意:我们之前调整的 rmem_max 与 wmem_max  也是接收缓存区大小,当然这个缓冲区与Ring buffer 无关, rmem_max 与 wmem_max 只针对 TCP , 我们一般的查看或者调整方式如下:


root@E2000-Ubuntu:~# sysctl -a | grep rmem
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.ipv4.tcp_rmem = 4096	131072	6291456
net.ipv4.udp_rmem_min = 4096root@E2000-Ubuntu:~# sysctl -a | grep wmem
net.core.wmem_default = 212992
net.core.wmem_max = 212992
net.ipv4.tcp_wmem = 4096	16384	4194304
net.ipv4.udp_wmem_min = 4096root@E2000-Ubuntu:~# cat /proc/sys/net/core/wmem_max
212992
root@E2000-Ubuntu:~# cat /proc/sys/net/core/rmem_max
212992
root@E2000-Ubuntu:~# cat /proc/sys/net/core/rmem_default
212992setsockopt( sock, SOL_SOCKET, SO_RCVBUF,.....)
setsockopt( sock, SOL_SOCKET, SO_SNDBUF,.....)

上面主要针对 TCP 的接收和发送缓冲区,在收包时,都发生在网卡驱动从rx ring buffer 
拿到数据之后,在发包时,发生在 tx ring buffer之前。

那具体 SO_RCVBUF 与 tcp_rmem 有什么关系呢?


tcp连接建立时,SO_RCVBUF初始化为tcp_rmem。随着tcp握手及通信,SO_RCVBUF是会动态调整的,调整的范围不受rmem_max限制,只受tcp_rmem的限制。但是如果手动通过setsockopt设置接收缓冲区大小,则自动调整接收缓冲区大小的机制失效,而且setsockopt是否成功会受到rmem_max的限制。


参考链接:https://www.jianshu.com/p/c93727fa8c2e
 


http://www.ppmy.cn/news/296114.html

相关文章

selenium/webdriver运行原理与机制

最近在看一些底层的东西。driver翻译过来是驱动&#xff0c;司机的意思。如果将webdriver比做成司机&#xff0c;竟然非常恰当。 我们可以把WebDriver驱动浏览器类比成出租车司机开出租车。在开出租车时有三个角色&#xff1a; 乘客&#xff1a;他/她告诉出租车司机去哪里&…

python -- 绘制colorbar时设置标签为居中显示

python – 绘制colorbar时设置标签为居中显示 在海洋气象领域的相关研究中&#xff0c;对于一些异常信号的二维填色图绘制时&#xff0c;通常在设置colorbar都是以0为中心对称分布的。而在绘制colorbar时&#xff0c;由于存在负号会使得默认colorbar标签不太好看&#xff08;强…

容器启动加速-nydus

容器启动加速-nydus 1. 概述 1.1. 使用nydus需要考虑的问题1.2. 当前实践探索中遇到的问题 2. 容器集成nydus 2.1. contanierd 集成【核心】 2.1.1. 版本说明2.1.2. 概念说明2.1.3. 构建 2.1.3.1. 构建 buildkitd2.1.3.2. 安装nydus2.1.3.3. 启动buildkitd进程2.1.3.4. 构建镜像…

Ubuntu1604下安装NvidiaGTX1080Ti显卡驱动+CUDA教程

ubuntu16.04下安装GTX1080TI显卡驱动安装CUDA

ubuntu18 1080ti 安装cuda10.0 cudnn 老版本的 caffe

1 cuda https://www.jianshu.com/p/f3a3d8dc9ba6 2 cudnn https://blog.csdn.net/qq_33200967/article/details/80689543#CUDA_71

ubuntu16.04+1080ti+cuda8.0+cudnn6.0+tensorflow1.4.0gpu

1.安装参考 https://blog.csdn.net/wssywh/article/details/79786613 2.tensorflow不同版本要求cudacudnn https://blog.csdn.net/oMoDao1/article/details/83241074 3.cuda与显卡驱动版本 https://www.cnblogs.com/superxuezhazha/p/10623270.html

Ubuntu16.04+Gtx1080ti+cuda9.0+cudnn7+Anaconda3+opencv3+pytorch+jupyter notebook配置

https://blog.csdn.net/qq_19784349/article/details/78780011?utm_sourceblogxgwz0 简单方便的OpenCV安装教程———— Ubuntu16.04cuda9.0opencv3.1 https://blog.csdn.net/maqing9479/article/details/79103520