简介
本文讲解得e1000e网卡驱动主要用于intel网卡,以驱动的设计流程,分析整个驱动的接收和发送包过程。
首先介绍4个e1000e基础知识:
1)PCIE的配置空间初始化:PCIE卡都遵循一个标准, x86通过往2个内存地址读写就可以控制IO桥访问一个内部寄存器+一个地址偏移, 就可以读写PCI的配置空间, 操作系统实际上就是用这个机制, 判断卡位是否插上了卡, 卡是否合法, 以及写对应的配置区域(相当于初始化);
2)msix机制及初始化:OS在初始化配置区的时候, 会根据卡将pci卡的msix起始地址写到pci配置的扩展能力区域, 驱动只需要去读取对应的区域, 像os申请msix向量, 即可使用msix中断是一种特殊的中断, 不需要中断线, 但需要PCIE具备msix能力, 主机也必须支持apic才可. 当系统初始化时, 同时初始化主机上2个特殊硬件, IOAPIC和LocalAPIC, 在内存虚拟地址中开辟一段内存, 给每个cpu分配中断向量. 后面只要往这个内存上写触发设备信息, 那么就会被内存控制器劫持, 内存控制立即明白这是有外设触发了中断, 通知ioapic发送广播, 当对应的cpu判断对应的向量, 知道这个是要被自己处理, 就会处理这个中断。
3)napi机制:napi也是网络设备的一个机制, 把设备的napi的list挂到系统上, 随即发送一个软中断, 调用一个回调函数
4) dma机制:e1000采用的是自动收发, 就是说数据包从网卡的fifo到skb里面, 或者从skb到网卡的fifo是由dma自动完成的, 在完成后会触发msix中断
下面进行源码的分析:
1、注册网卡驱动:
static int __init e1000_init_module(void)
{return pci_register_driver(&e1000_driver);
}
module_init(e1000_init_module);
e1000_init_module() 只干了一个事情, 注册了一个pci驱动结构体到pci驱动链表, 当pci注册后, 根据pci驱动框架, 匹配成功后自然会执行probe函数。
static struct pci_driver e1000_driver = {.name = e1000e_driver_name,.id_table = e1000_pci_tbl,.probe = e1000_probe,.remove = e1000_remove,.driver = {.pm = &e1000_pm_ops,},
};
probe函数主要做以下工作:
1、协调bus总线宽度
2、霸占虚拟地址映地址
3、分配并初始化网络设备,主要描述硬件, 及设备内存, 控制调度等信息
4、初始化网卡的收发队列net_queue、网卡的mac地址链表、 name space挂载到内核链表、 报文最大长度、napi、设备名称、硬件frame长度、 映射bar0、分配中断信息、分配adapter的ring结构、读取eeprom的信息、 等等…
static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{struct net_device *netdev;struct e1000_adapter *adapter;struct e1000_hw *hw;const struct e1000_info *ei = e1000_info_tbl[ent->driver_data];resource_size_t mmio_start, mmio_len;resource_size_t flash_start, flash_len;err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));//协调总线宽度if (!err) {pci_using_dac = 1;} else {err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));if (err) {dev_err(&pdev->dev,"No usable DMA configuration, aborting\n");goto err_dma;}}bars = pci_select_bars(pdev, IORESOURCE_MEM);//映射barerr = pci_request_selected_regions_exclusive(pdev, bars,e1000e_driver_name); // 霸占虚拟地址映地址pci_set_master(pdev); /* PCI config space info */err = pci_save_state(pdev);netdev = alloc_etherdev(sizeof(struct e1000_adapter)); //分配并初始化网络设备if (!netdev)goto err_alloc_etherdev;SET_NETDEV_DEV(netdev, &pdev->dev);netdev->irq = pdev->irq;pci_set_drvdata(pdev, netdev);adapter = netdev_priv(netdev);mmio_start = pci_resource_start(pdev, 0);mmio_len = pci_resource_len(pdev, 0);adapter->hw.hw_addr = ioremap(mmio_start, mmio_len);if (!adapter->hw.hw_addr)goto err_ioremap;if ((adapter->flags & FLAG_HAS_FLASH) &&(pci_resource_flags(pdev, 1) & IORESOURCE_MEM) &&(hw->mac.type < e1000_pch_spt)) {flash_start = pci_resource_start(pdev, 1);flash_len = pci_resource_len(pdev, 1);adapter->hw.flash_address = ioremap(flash_start, flash_len);if (!adapter->hw.flash_address)goto err_flashmap;}/* Set default EEE advertisement */if (adapter->flags2 & FLAG2_HAS_EEE)adapter->eee_advert = MDIO_EEE_100TX | MDIO_EEE_1000T;/* construct the net_device struct */netdev->netdev_ops = &e1000e_netdev_ops; e1000e_set_ethtool_ops(netdev);netdev->watchdog_timeo = 5 * HZ;netif_napi_add(netdev, &adapter->napi, e1000e_poll, 64);strlcpy(netdev->name, pci_name(pdev), sizeof(netdev->name));netdev->mem_start = mmio_start;netdev->mem_end = mmio_start + mmio_len;adapter->bd_number = cards_found++;e1000e_check_options(adapter);/* setup adapter struct */e1000_sw_init(adapter);if (hw->phy.ops.check_reset_block && hw->phy.ops.check_reset_block(hw))dev_info(&pdev->dev,"PHY reset is blocked due to SOL/IDER session.\n"); e1000_eeprom_checks(adapter);//读取eeprom的信息/* copy the MAC address */e1000e_read_mac_addr(&adapter->hw);memcpy(netdev->dev_addr, adapter->hw.mac.addr, netdev->addr_len);//初始化e1000_adapter成员参数init_timer(&adapter->watchdog_timer);adapter->watchdog_timer.function = e1000_watchdog;adapter->watchdog_timer.data = (unsigned long)adapter;/* reset the hardware with the new settings */e1000e_reset(adapter);//重置e1000e网卡/* If the controller has AMT, do not set DRV_LOAD until the interface* is up. For all other cases, let the f/w know that the h/w is now* under the control of the driver.*/if (!(adapter->flags & FLAG_HAS_AMT))e1000e_get_hw_control(adapter);strlcpy(netdev->name, "eth%d", sizeof(netdev->name));err = register_netdev(netdev);//注册网卡驱动/* carrier off reporting is important to ethtool even BEFORE open */netif_carrier_off(netdev);e1000_print_device_info(adapter);return 0;
}
这个阶段有2个部分需要注意:
- 网络设备基础信息被分配, 但是真正数据传输相关结构和内存没有分配, 中断线没有分配, 也就是说, 这个过程仅仅实例化了一个网卡设备的空壳, 并没有占用实际的硬件资源。
- struct net_device的操作函数被初始化, 也就是说, 后面后面网卡执行up和down的时候, 就可直接调用网卡的ops方法 . 这种设计是非常好的, 用的时候分配, 不用的时候不占用资源。
2、网卡up过程
调用e1000_open函数:
分配adapter->tx_ring的desc一致性内存, 共256个desc, 并初始化tx_ring,desc是真正要分配给dma控制器的:
int e1000e_setup_tx_resources(struct e1000_ring *tx_ring)
{struct e1000_adapter *adapter = tx_ring->adapter;int err = -ENOMEM, size;size = sizeof(struct e1000_buffer) * tx_ring->count;tx_ring->buffer_info = vzalloc(size);if (!tx_ring->buffer_info)goto err;/* round up to nearest 4K */tx_ring->size = tx_ring->count * sizeof(struct e1000_tx_desc);tx_ring->size = ALIGN(tx_ring->size, 4096);e1000_alloc_ring_dma(adapter, tx_ring);
}
分配adapter的buffer_info结构, 共256个,buffer_info只是一个描述结构体
int e1000e_setup_rx_resources(struct e1000_ring *rx_ring)
{struct e1000_adapter *adapter = rx_ring->adapter;struct e1000_buffer *buffer_info;int i, size, desc_len, err = -ENOMEM;size = sizeof(struct e1000_buffer) * rx_ring->count;rx_ring->buffer_info = vzalloc(size);for (i = 0; i < rx_ring->count; i++) {buffer_info = &rx_ring->buffer_info[i];buffer_info->ps_pages = kcalloc(PS_PAGE_BUFFERS,sizeof(struct e1000_ps_page),GFP_KERNEL);if (!buffer_info->ps_pages)goto err_pages;}desc_len = sizeof(union e1000_rx_desc_packet_split);/* Round up to nearest 4K */rx_ring->size = rx_ring->count * desc_len;rx_ring->size = ALIGN(rx_ring->size, 4096);e1000_alloc_ring_dma(adapter, rx_ring);
}
e1000_configure主要做下面的工作:
1、设置收包模式, 设置管理信息等;
2、设置发包函数dma控制的传输地址;
3、设置收包函数内存分配回调函数和清理回调函数以及设置dma控制的传输地址。
static void e1000_configure(struct e1000_adapter *adapter)
{struct e1000_ring *rx_ring = adapter->rx_ring;e1000e_set_rx_mode(adapter->netdev);e1000_restore_vlan(adapter);e1000_init_manageability_pt(adapter);e1000_configure_tx(adapter);//设置dma控制的传输地址if (adapter->netdev->features & NETIF_F_RXHASH)e1000e_setup_rss_hash(adapter);e1000_setup_rctl(adapter);e1000_configure_rx(adapter);//设置收包函数内存分配回调函数, 和清理回调函数. 及设置dma控制的传输地址adapter->alloc_rx_buf(rx_ring, e1000_desc_unused(rx_ring), GFP_KERNEL);
}
然后调用发包函数e1000_configure_tx:
static void e1000_configure_tx(struct e1000_adapter *adapter)
{struct e1000_hw *hw = &adapter->hw;struct e1000_ring *tx_ring = adapter->tx_ring;u64 tdba;u32 tdlen, tctl, tarc;/* Setup the HW Tx Head and Tail descriptor pointers */tdba = tx_ring->dma;
}
最后调用收包函数e1000_configure_rx:
static void e1000_configure_rx(struct e1000_adapter *adapter)
{struct e1000_hw *hw = &adapter->hw;struct e1000_ring *rx_ring = adapter->rx_ring;u64 rdba;u32 rdlen, rctl, rxcsum, ctrl_ext;rdlen = rx_ring->count * sizeof(union e1000_rx_desc_extended);adapter->clean_rx = e1000_clean_rx_irq;adapter->alloc_rx_buf = e1000_alloc_rx_buffers;rdba = rx_ring->dma;
}
e1000_clean_rx_irq:
收包函数 (umap已收取的报文skb, 并申请新的skb, 做dma map到desc上, 并把报文传递给e1000_receive_skb(内核中上层报文处理函数:
static bool e1000_clean_rx_irq(struct e1000_ring *rx_ring, int *work_done,int work_to_do)
{dma_rmb(); /* read descriptor and rx_buffer_info after status DD */skb = buffer_info->skb;buffer_info->skb = NULL;cleaned_count++;dma_unmap_single(&pdev->dev, buffer_info->dma,adapter->rx_buffer_len, DMA_FROM_DEVICE);buffer_info->dma = 0;length = le16_to_cpu(rx_desc->wb.upper.length); if (length < copybreak) {struct sk_buff *new_skb =napi_alloc_skb(&adapter->napi, length);e1000_receive_skb(adapter, netdev, skb, staterr,rx_desc->wb.upper.vlan);}
e1000_alloc_rx_buffers函数主要进行skb分配256个, 同时将skb的物理地址传给dma控制器, 并提示dma可以0处开始收包, 可以一直收256个:
static void e1000_alloc_rx_buffers(struct e1000_ring *rx_ring,int cleaned_count, gfp_t gfp)
{struct e1000_adapter *adapter = rx_ring->adapter;struct net_device *netdev = adapter->netdev;struct pci_dev *pdev = adapter->pdev;union e1000_rx_desc_extended *rx_desc;struct e1000_buffer *buffer_info;struct sk_buff *skb;unsigned int i;unsigned int bufsz = adapter->rx_buffer_len;i = rx_ring->next_to_use;buffer_info = &rx_ring->buffer_info[i];while (cleaned_count--) {skb = buffer_info->skb;if (skb) {skb_trim(skb, 0);goto map_skb;}skb = __netdev_alloc_skb_ip_align(netdev, bufsz, gfp);if (!skb) {/* Better luck next round */adapter->alloc_rx_buff_failed++;break;}} buffer_info->skb = skb;
map_skb:buffer_info->dma = dma_map_single(&pdev->dev, skb->data,adapter->rx_buffer_len,DMA_FROM_DEVICE);
}
2、申请msix中断和常规中断
static int e1000_request_irq(struct e1000_adapter *adapter)
{struct net_device *netdev = adapter->netdev;int err;if (adapter->msix_entries) {err = e1000_request_msix(adapter);if (!err)return err;/* fall back to MSI */e1000e_reset_interrupt_capability(adapter);adapter->int_mode = E1000E_INT_MODE_MSI;e1000e_set_interrupt_capability(adapter);}if (adapter->flags & FLAG_MSI_ENABLED) {err = request_irq(adapter->pdev->irq, e1000_intr_msi, 0,netdev->name, netdev);if (!err)return err;/* fall back to legacy interrupt */e1000e_reset_interrupt_capability(adapter);adapter->int_mode = E1000E_INT_MODE_LEGACY;}err = request_irq(adapter->pdev->irq, e1000_intr, IRQF_SHARED,netdev->name, netdev);return err;
}
e1000_request_msix函数分别注册了3个中断函数: e1000_intr_msix_rx, e1000_intr_msix_tx , e1000_msix_other
static int e1000_request_msix(struct e1000_adapter *adapter)
{struct net_device *netdev = adapter->netdev;int err = 0, vector = 0;err = request_irq(adapter->msix_entries[vector].vector,e1000_intr_msix_rx, 0, adapter->rx_ring->name,netdev);err = request_irq(adapter->msix_entries[vector].vector,e1000_intr_msix_tx, 0, adapter->tx_ring->name,netdev);adapter->tx_ring->itr_register = adapter->hw.hw_addr +E1000_EITR_82574(vector);adapter->tx_ring->itr_val = adapter->itr;vector++;err = request_irq(adapter->msix_entries[vector].vector,e1000_msix_other, 0, netdev->name, netdev);e1000_configure_msix(adapter);//设置dma发送中断的频率以及内容return 0;
}
e1000_intr_msix_rx: 调用系统收包函数. 通过调用发送软中断, 通知内核调度网卡napi的poll函数. 调用e1000_clean, 这个函数清理收队列skb的映射信息. 申请同等数量的skb, 同时根据流量, 设置是否卡其常规中断
static irqreturn_t e1000_intr_msix_rx(int __always_unused irq, void *data)
{struct net_device *netdev = data;struct e1000_adapter *adapter = netdev_priv(netdev);struct e1000_ring *rx_ring = adapter->rx_ring;/* Write the ITR value calculated at the end of the* previous interrupt.*/if (rx_ring->set_itr) {u32 itr = rx_ring->itr_val ?1000000000 / (rx_ring->itr_val * 256) : 0;writel(itr, rx_ring->itr_register);rx_ring->set_itr = 0;}if (napi_schedule_prep(&adapter->napi)) {adapter->total_rx_bytes = 0;adapter->total_rx_packets = 0;__napi_schedule(&adapter->napi);}return IRQ_HANDLED;
}
e1000_intr_msix_tx: 清理中间信息. 调用e1000_clean_tx_irq释放已经发送完成的skb内存, 解除skb的dma映射:
static irqreturn_t e1000_intr_msix_tx(int __always_unused irq, void *data)
{struct net_device *netdev = data;struct e1000_adapter *adapter = netdev_priv(netdev);struct e1000_hw *hw = &adapter->hw;struct e1000_ring *tx_ring = adapter->tx_ring;adapter->total_tx_bytes = 0;adapter->total_tx_packets = 0;if (!e1000_clean_tx_irq(tx_ring))/* Ring was not completely cleaned, so fire another interrupt */ew32(ICS, tx_ring->ims_val);if (!test_bit(__E1000_DOWN, &adapter->state))ew32(IMS, adapter->tx_ring->ims_val);return IRQ_HANDLED;
}
最后要说明一下的是发包函数, 系统发包调用底层的的e1000_xmit_frame, 这个函数重要的功能就是把要发送的报文地址映射到dma发射区. 通知dma发送
static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,struct net_device *netdev)
{struct e1000_adapter *adapter = netdev_priv(netdev);struct e1000_ring *tx_ring = adapter->tx_ring;unsigned int first;unsigned int tx_flags = 0;unsigned int len = skb_headlen(skb);unsigned int nr_frags;unsigned int mss;int count = 0;int tso;unsigned int f;count = e1000_tx_map(tx_ring, skb, first, adapter->tx_fifo_limit,nr_frags);netdev_sent_queue(netdev, skb->len);e1000_tx_queue(tx_ring, tx_flags, count);/* Make sure there is space in the ring for the next send. */e1000_maybe_stop_tx(tx_ring,(MAX_SKB_FRAGS *DIV_ROUND_UP(PAGE_SIZE,adapter->tx_fifo_limit) + 2));dev_kfree_skb_any(skb);
}
总结
1、收包的大致流程:
- 申请skb, 把skb映射到dma, 开启dma收包;
- dma收包后发起中断, 调用, 清理dma映射区, 申请同等数量的skb, 把这些skb重新映射到dma, 相当于把空闲的dma补上;
- 把收到的报文, 丢给系统内核协议栈解析;
- 并根据流量大小, 看是否开启传统中断(传统中断的处理上面有描述);
- 循环;
e1000_intr_msix_rx -> e1000_clean -> e1000_clean_rx_irq -> e1000_receive_skb
2、发包的大致流程:
- send发送到skb;
- 调用底层驱动e1000_xmit_frame发送skb;
- 申请dma映射到dma发送队列, 准备发送;
- 发送完成,发送中断, 调用e1000_intr_msix_tx, 解除skb的dma映射;
- 循环;
e1000_xmit_frame-> e1000_tx_map -> e1000_tx_queue -> e1000_intr_msix_tx -> e1000_clean_tx_irq -> (skb_dma_unmap, dev_kfree_skb_any)