CNN(八)：Inception V1算法实战与解析

🍨 本文为🔗365天深度学习训练营中的学习记录博客
🍖 原作者：K同学啊|接辅导、项目定制

1 Inception V1

Inception v1论文

1.1 理论知识

GoogLeNet首次出现在2014年ILSVRC比赛中获得冠军。这次的版本通常称其为Inception V1。Inception V1有22层深，参数量为5M。同一时期的VGGNet性能和InceptionV1差不多，但是参数量远大于Inception V1.

Inception Module是Inception V1的核心组成单元，提出了卷积层的并行结构，实现了在同一层就可以提取不同的特征，如下图（a)所示。

按照这样的结构来增加网络的深度，虽然可以提升性能，但是还面临计算量大（参数多）的问题。为改善这种现象，Inception Module借鉴Network-in-Network的思想，使用1x1的卷积核实现降维操作（也间接增加了网络的深度），以此来减少网络的参数量与计算量，如上图b所示。

备注举例：假如前一层的输出为100x100x128，经过具有256个5x5卷积核的卷积层之后（stride=1, pad=2）, 输出数据为100x100x256.其中，卷积层的参数为5x5x128x256+256。例如上一层输出先经过具有32个1x1卷积核的卷积层（1x1卷积降低了通道数，且特征图尺寸不变），经过具有256个5x5卷积核的卷积层，最终的输出数据仍为100x100x256，但卷积参数量以及减少为(128x1x1x32+32)+(32x5x5x256+256)，参数数量减少为原来的约四分之一。其计算量由原先的8.191x10e9，降低至2.048x10e9。

1x1卷积核的作用：1x1卷积核的最大作用是降低输入特征图的通道数，减少网络的参数量与计算量。

最后Inception Module基本由1x1卷积，3x3卷积，5x5卷积，3x3最大池化四个基本单元组成，对四个基本单元运算结果进行通道上组合，不同大小的卷积核赋予不同大小的感受野，从而提取到图像不同尺度的信息，进行融合，得到图像更好的表征，就是Inception Module的核心思想。

1.2 算法结构

实现的Inception v1网络结构图如下所示：

注：另外增加了两个辅助分支，作用有两点：

（1）避免梯度消失，用于前向传导梯度。反向传播时，如果有一层求导为0，链式求导结果则为0。

（2）将中间某一层输出用作分类，起到模型融合作用，实际测试时，这两个辅助softmax分支会被去掉。在后续模型的发展中，该方法采用较少。

详细网络结构图如下所示：

2 代码实现

2.1 开发环境

电脑系统：ubuntu16.04

编译器：Jupter Lab

语言环境：Python 3.7

深度学习环境：Pytorch

2.2 前期准备

2.2.1 设置GPU

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision
from torchvision import transforms, datasets
import os, PIL, pathlib, warningswarnings.filterwarnings("ignore")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")print(device)

2.2.2 导入数据

import os,PIL,random,pathlib
data_dir = '../data/4-data/'
data_dir = pathlib.Path(data_dir)
data_dirdata_paths = list(data_dir.glob('*'))
classNames = [str(path).split('\\')[-1] for path in data_paths]
print('classNames:', classNames , '\n')total_dir = '../data/4-data/'
train_transforms = transforms.Compose([transforms.Resize([224, 224]),  # resize输入图片transforms.ToTensor(),  # 将PIL Image或numpy.ndarray转换成tensortransforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])  # 从数据集中随机抽样计算得到
])total_data = datasets.ImageFolder(total_dir, transform=train_transforms)
print(total_data, '\n')print(total_data.class_to_idx)

结果如下所示：

2.2.3 划分数据集

train_size = int(0.8 * len(total_data))
test_size = len(total_data) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(total_data, [train_size, test_size])
print(train_dataset, test_dataset)batch_size = 4
train_dl = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,shuffle=True,num_workers=1,pin_memory=False)
test_dl = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size,shuffle=True,num_workers=1,pin_memory=False)for X, y in test_dl:print("Shape of X [N, C, H, W]:", X.shape)print("Shape of y:", y.shape, y.dtype)break

结果如下所示：

2.3 Inception的实现

这里去掉了两个辅助分支，直接复现主支。

2.3.1 inception_block

定义一个名为Inception的类，继承自nn.Module。inception_block类包含了Inception V1模型的所有层和参数。

import torch
import torch.nn as nn
import torch.nn.functional as Fclass inception_block(nn.Module):def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):super(inception_block, self).__init__()# 1x1 conv branchself.branch1 = nn.Sequential(nn.Conv2d(in_channels, ch1x1, kernel_size=1),nn.BatchNorm2d(ch1x1),nn.ReLU(inplace=True))# 1x1 conv -> 3x3 conv branchself.branch2 = nn.Sequential(nn.Conv2d(in_channels, ch3x3red, kernel_size=1),nn.BatchNorm2d(ch3x3red),nn.ReLU(inplace=True),nn.Conv2d(ch3x3red, ch3x3, kernel_size=3, padding=1),nn.BatchNorm2d(ch3x3),nn.ReLU(inplace=True))# 1x1 conv -> 5x5 conv branchself.branch3 = nn.Sequential(nn.Conv2d(in_channels, ch5x5red, kernel_size=1),nn.BatchNorm2d(ch5x5red),nn.ReLU(inplace=True),nn.Conv2d(ch5x5red, ch5x5, kernel_size=5, padding=2),nn.BatchNorm2d(ch5x5),nn.ReLU(inplace=True))# 3x3 max pooling -> 1x1 conv branchself.branch4 = nn.Sequential(nn.MaxPool2d(kernel_size=3, stride=1, padding=1),nn.Conv2d(in_channels, pool_proj, kernel_size=1),nn.BatchNorm2d(pool_proj),nn.ReLU(inplace=True))def forward(self, x):# compute forward pass through all branches # and concatenate the outout feature mapsbranch1_output = self.branch1(x)branch2_output = self.branch2(x)branch3_output = self.branch3(x)branch4_output = self.branch4(x)outputs = [branch1_output, branch2_output, branch3_output, branch4_output]return torch.cat(outputs, 1)

在__init__方法中，我们定义了四个分支，分别是：

(1) branch1：一个1x1卷积层；

(2) branch2：一个1x1卷积层+一个3x3卷积层；

(3) branch3：一个1x1卷积层+5x5卷积层；

(4) branch4：一个3x3最大池化层+一个1x1卷积层；

每个分支都包含了一些卷积层、批归一化层和激活函数。这些层都是PyTorch中的标准层，我们可以使用nn.Conv2d、nn.BatchNorm2d和nn.ReLU分别定义卷积层、批归一化层和ReLU激活函数。

在forward方法中，我们计算从输入到所有分支的前向传递，并将所有分支的特征图拼接在一起。最后，我们返回拼接后的特征图。

2.3.2 Inception v1

下面定义Inception v1模型，使用nn.ModuleList和nn.Sequential组合多个Inception模块和其他层。

class InceptionV1(nn.Module):def __init__(self, num_classes=4):super(InceptionV1, self).__init__()self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)self.conv2 = nn.Conv2d(64, 64, kernel_size=1, stride=1, padding=0)self.conv3 = nn.Conv2d(64, 192, kernel_size=3, stride=1, padding=1)self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)self.inception3a = inception_block(192, 64, 96, 128, 16, 32, 32)self.inception3b = inception_block(256, 128, 128, 192, 32, 96, 64)self.maxpool3 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)self.inception4a = inception_block(480, 192, 96, 208, 16, 48, 64)self.inception4b = inception_block(512, 160, 112, 224, 24, 64, 64)self.inception4c = inception_block(512, 128, 128, 256, 24, 64, 64)self.inception4d = inception_block(512, 112, 144, 288, 32, 64, 64)self.inception4e = inception_block(528, 256, 160, 320, 32, 128, 128)self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)self.inception5a = inception_block(832, 256, 160, 320, 32, 128, 128)self.inception5b = nn.Sequential(inception_block(832, 384, 192, 384, 48, 128, 128),nn.AvgPool2d(kernel_size=7, stride=1, padding=0),nn.Dropout(0.4))# 全连接网络层，用于分类self.classifier = nn.Sequential(nn.Linear(in_features=1024, out_features=1024),nn.ReLU(),nn.Linear(in_features=1024, out_features=num_classes),nn.Softmax(dim=1))def forward(self, x):x = self.conv1(x)x = F.relu(x)x = self.maxpool1(x)x = self.conv2(x)x = F.relu(x)x = self.conv3(x)x = F.relu(x)x = self.maxpool2(x)x = self.inception3a(x)x = self.inception3b(x)x = self.maxpool3(x)x = self.inception4a(x)x = self.inception4b(x)x = self.inception4c(x)x = self.inception4d(x)x = self.inception4e(x)x = self.maxpool4(x)x = self.inception5a(x)x = self.inception5b(x)x = torch.flatten(x, start_dim=1)x = self.classifier(x)return x

2.3.3 输出模型结构

# 统计模型参数量以及其他指标
import torchsummary# 调用并将模型转移到GPU中
model = InceptionV1().to(device)# 显示网络结构
torchsummary.summary(model, (3, 224, 224))
print(model)

输出如下所示

----------------------------------------------------------------Layer (type)               Output Shape         Param #
================================================================Conv2d-1         [-1, 64, 112, 112]           9,472MaxPool2d-2           [-1, 64, 56, 56]               0Conv2d-3           [-1, 64, 56, 56]           4,160Conv2d-4          [-1, 192, 56, 56]         110,784MaxPool2d-5          [-1, 192, 28, 28]               0Conv2d-6           [-1, 64, 28, 28]          12,352BatchNorm2d-7           [-1, 64, 28, 28]             128ReLU-8           [-1, 64, 28, 28]               0Conv2d-9           [-1, 96, 28, 28]          18,528BatchNorm2d-10           [-1, 96, 28, 28]             192ReLU-11           [-1, 96, 28, 28]               0Conv2d-12          [-1, 128, 28, 28]         110,720BatchNorm2d-13          [-1, 128, 28, 28]             256ReLU-14          [-1, 128, 28, 28]               0Conv2d-15           [-1, 16, 28, 28]           3,088BatchNorm2d-16           [-1, 16, 28, 28]              32ReLU-17           [-1, 16, 28, 28]               0Conv2d-18           [-1, 32, 28, 28]          12,832BatchNorm2d-19           [-1, 32, 28, 28]              64ReLU-20           [-1, 32, 28, 28]               0MaxPool2d-21          [-1, 192, 28, 28]               0Conv2d-22           [-1, 32, 28, 28]           6,176BatchNorm2d-23           [-1, 32, 28, 28]              64ReLU-24           [-1, 32, 28, 28]               0inception_block-25          [-1, 256, 28, 28]               0Conv2d-26          [-1, 128, 28, 28]          32,896BatchNorm2d-27          [-1, 128, 28, 28]             256ReLU-28          [-1, 128, 28, 28]               0Conv2d-29          [-1, 128, 28, 28]          32,896BatchNorm2d-30          [-1, 128, 28, 28]             256ReLU-31          [-1, 128, 28, 28]               0Conv2d-32          [-1, 192, 28, 28]         221,376BatchNorm2d-33          [-1, 192, 28, 28]             384ReLU-34          [-1, 192, 28, 28]               0Conv2d-35           [-1, 32, 28, 28]           8,224BatchNorm2d-36           [-1, 32, 28, 28]              64ReLU-37           [-1, 32, 28, 28]               0Conv2d-38           [-1, 96, 28, 28]          76,896BatchNorm2d-39           [-1, 96, 28, 28]             192ReLU-40           [-1, 96, 28, 28]               0MaxPool2d-41          [-1, 256, 28, 28]               0Conv2d-42           [-1, 64, 28, 28]          16,448BatchNorm2d-43           [-1, 64, 28, 28]             128ReLU-44           [-1, 64, 28, 28]               0inception_block-45          [-1, 480, 28, 28]               0MaxPool2d-46          [-1, 480, 14, 14]               0Conv2d-47          [-1, 192, 14, 14]          92,352BatchNorm2d-48          [-1, 192, 14, 14]             384ReLU-49          [-1, 192, 14, 14]               0Conv2d-50           [-1, 96, 14, 14]          46,176BatchNorm2d-51           [-1, 96, 14, 14]             192ReLU-52           [-1, 96, 14, 14]               0Conv2d-53          [-1, 208, 14, 14]         179,920BatchNorm2d-54          [-1, 208, 14, 14]             416ReLU-55          [-1, 208, 14, 14]               0Conv2d-56           [-1, 16, 14, 14]           7,696BatchNorm2d-57           [-1, 16, 14, 14]              32ReLU-58           [-1, 16, 14, 14]               0Conv2d-59           [-1, 48, 14, 14]          19,248BatchNorm2d-60           [-1, 48, 14, 14]              96ReLU-61           [-1, 48, 14, 14]               0MaxPool2d-62          [-1, 480, 14, 14]               0Conv2d-63           [-1, 64, 14, 14]          30,784BatchNorm2d-64           [-1, 64, 14, 14]             128ReLU-65           [-1, 64, 14, 14]               0inception_block-66          [-1, 512, 14, 14]               0Conv2d-67          [-1, 160, 14, 14]          82,080BatchNorm2d-68          [-1, 160, 14, 14]             320ReLU-69          [-1, 160, 14, 14]               0Conv2d-70          [-1, 112, 14, 14]          57,456BatchNorm2d-71          [-1, 112, 14, 14]             224ReLU-72          [-1, 112, 14, 14]               0Conv2d-73          [-1, 224, 14, 14]         226,016BatchNorm2d-74          [-1, 224, 14, 14]             448ReLU-75          [-1, 224, 14, 14]               0Conv2d-76           [-1, 24, 14, 14]          12,312BatchNorm2d-77           [-1, 24, 14, 14]              48ReLU-78           [-1, 24, 14, 14]               0Conv2d-79           [-1, 64, 14, 14]          38,464BatchNorm2d-80           [-1, 64, 14, 14]             128ReLU-81           [-1, 64, 14, 14]               0MaxPool2d-82          [-1, 512, 14, 14]               0Conv2d-83           [-1, 64, 14, 14]          32,832BatchNorm2d-84           [-1, 64, 14, 14]             128ReLU-85           [-1, 64, 14, 14]               0inception_block-86          [-1, 512, 14, 14]               0Conv2d-87          [-1, 128, 14, 14]          65,664BatchNorm2d-88          [-1, 128, 14, 14]             256ReLU-89          [-1, 128, 14, 14]               0Conv2d-90          [-1, 128, 14, 14]          65,664BatchNorm2d-91          [-1, 128, 14, 14]             256ReLU-92          [-1, 128, 14, 14]               0Conv2d-93          [-1, 256, 14, 14]         295,168BatchNorm2d-94          [-1, 256, 14, 14]             512ReLU-95          [-1, 256, 14, 14]               0Conv2d-96           [-1, 24, 14, 14]          12,312BatchNorm2d-97           [-1, 24, 14, 14]              48ReLU-98           [-1, 24, 14, 14]               0Conv2d-99           [-1, 64, 14, 14]          38,464BatchNorm2d-100           [-1, 64, 14, 14]             128ReLU-101           [-1, 64, 14, 14]               0MaxPool2d-102          [-1, 512, 14, 14]               0Conv2d-103           [-1, 64, 14, 14]          32,832BatchNorm2d-104           [-1, 64, 14, 14]             128ReLU-105           [-1, 64, 14, 14]               0inception_block-106          [-1, 512, 14, 14]               0Conv2d-107          [-1, 112, 14, 14]          57,456BatchNorm2d-108          [-1, 112, 14, 14]             224ReLU-109          [-1, 112, 14, 14]               0Conv2d-110          [-1, 144, 14, 14]          73,872BatchNorm2d-111          [-1, 144, 14, 14]             288ReLU-112          [-1, 144, 14, 14]               0Conv2d-113          [-1, 288, 14, 14]         373,536BatchNorm2d-114          [-1, 288, 14, 14]             576ReLU-115          [-1, 288, 14, 14]               0Conv2d-116           [-1, 32, 14, 14]          16,416BatchNorm2d-117           [-1, 32, 14, 14]              64ReLU-118           [-1, 32, 14, 14]               0Conv2d-119           [-1, 64, 14, 14]          51,264BatchNorm2d-120           [-1, 64, 14, 14]             128ReLU-121           [-1, 64, 14, 14]               0MaxPool2d-122          [-1, 512, 14, 14]               0Conv2d-123           [-1, 64, 14, 14]          32,832BatchNorm2d-124           [-1, 64, 14, 14]             128ReLU-125           [-1, 64, 14, 14]               0inception_block-126          [-1, 528, 14, 14]               0Conv2d-127          [-1, 256, 14, 14]         135,424BatchNorm2d-128          [-1, 256, 14, 14]             512ReLU-129          [-1, 256, 14, 14]               0Conv2d-130          [-1, 160, 14, 14]          84,640BatchNorm2d-131          [-1, 160, 14, 14]             320ReLU-132          [-1, 160, 14, 14]               0Conv2d-133          [-1, 320, 14, 14]         461,120BatchNorm2d-134          [-1, 320, 14, 14]             640ReLU-135          [-1, 320, 14, 14]               0Conv2d-136           [-1, 32, 14, 14]          16,928BatchNorm2d-137           [-1, 32, 14, 14]              64ReLU-138           [-1, 32, 14, 14]               0Conv2d-139          [-1, 128, 14, 14]         102,528BatchNorm2d-140          [-1, 128, 14, 14]             256ReLU-141          [-1, 128, 14, 14]               0MaxPool2d-142          [-1, 528, 14, 14]               0Conv2d-143          [-1, 128, 14, 14]          67,712BatchNorm2d-144          [-1, 128, 14, 14]             256ReLU-145          [-1, 128, 14, 14]               0inception_block-146          [-1, 832, 14, 14]               0MaxPool2d-147            [-1, 832, 7, 7]               0Conv2d-148            [-1, 256, 7, 7]         213,248BatchNorm2d-149            [-1, 256, 7, 7]             512ReLU-150            [-1, 256, 7, 7]               0Conv2d-151            [-1, 160, 7, 7]         133,280BatchNorm2d-152            [-1, 160, 7, 7]             320ReLU-153            [-1, 160, 7, 7]               0Conv2d-154            [-1, 320, 7, 7]         461,120BatchNorm2d-155            [-1, 320, 7, 7]             640ReLU-156            [-1, 320, 7, 7]               0Conv2d-157             [-1, 32, 7, 7]          26,656BatchNorm2d-158             [-1, 32, 7, 7]              64ReLU-159             [-1, 32, 7, 7]               0Conv2d-160            [-1, 128, 7, 7]         102,528BatchNorm2d-161            [-1, 128, 7, 7]             256ReLU-162            [-1, 128, 7, 7]               0MaxPool2d-163            [-1, 832, 7, 7]               0Conv2d-164            [-1, 128, 7, 7]         106,624BatchNorm2d-165            [-1, 128, 7, 7]             256ReLU-166            [-1, 128, 7, 7]               0inception_block-167            [-1, 832, 7, 7]               0Conv2d-168            [-1, 384, 7, 7]         319,872BatchNorm2d-169            [-1, 384, 7, 7]             768ReLU-170            [-1, 384, 7, 7]               0Conv2d-171            [-1, 192, 7, 7]         159,936BatchNorm2d-172            [-1, 192, 7, 7]             384ReLU-173            [-1, 192, 7, 7]               0Conv2d-174            [-1, 384, 7, 7]         663,936BatchNorm2d-175            [-1, 384, 7, 7]             768ReLU-176            [-1, 384, 7, 7]               0Conv2d-177             [-1, 48, 7, 7]          39,984BatchNorm2d-178             [-1, 48, 7, 7]              96ReLU-179             [-1, 48, 7, 7]               0Conv2d-180            [-1, 128, 7, 7]         153,728BatchNorm2d-181            [-1, 128, 7, 7]             256ReLU-182            [-1, 128, 7, 7]               0MaxPool2d-183            [-1, 832, 7, 7]               0Conv2d-184            [-1, 128, 7, 7]         106,624BatchNorm2d-185            [-1, 128, 7, 7]             256ReLU-186            [-1, 128, 7, 7]               0inception_block-187           [-1, 1024, 7, 7]               0AvgPool2d-188           [-1, 1024, 1, 1]               0Dropout-189           [-1, 1024, 1, 1]               0Linear-190                 [-1, 1024]       1,049,600ReLU-191                 [-1, 1024]               0Linear-192                    [-1, 4]           4,100Softmax-193                    [-1, 4]               0
================================================================
Total params: 7,041,172
Trainable params: 7,041,172
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 69.61
Params size (MB): 26.86
Estimated Total Size (MB): 97.05
----------------------------------------------------------------
InceptionV1((conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))(maxpool1): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)(conv2): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))(conv3): Conv2d(64, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(maxpool2): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)(inception3a): inception_block((branch1): Sequential((0): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True))(branch2): Sequential((0): Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(96, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch3): Sequential((0): Conv2d(192, 16, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch4): Sequential((0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)(1): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1))(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(3): ReLU(inplace=True)))(inception3b): inception_block((branch1): Sequential((0): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True))(branch2): Sequential((0): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(128, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(4): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch3): Sequential((0): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(32, 96, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch4): Sequential((0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)(1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(3): ReLU(inplace=True)))(maxpool3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)(inception4a): inception_block((branch1): Sequential((0): Conv2d(480, 192, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True))(branch2): Sequential((0): Conv2d(480, 96, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(96, 208, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(4): BatchNorm2d(208, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch3): Sequential((0): Conv2d(480, 16, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(16, 48, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch4): Sequential((0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)(1): Conv2d(480, 64, kernel_size=(1, 1), stride=(1, 1))(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(3): ReLU(inplace=True)))(inception4b): inception_block((branch1): Sequential((0): Conv2d(512, 160, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True))(branch2): Sequential((0): Conv2d(512, 112, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(112, 224, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(4): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch3): Sequential((0): Conv2d(512, 24, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(24, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch4): Sequential((0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)(1): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(3): ReLU(inplace=True)))(inception4c): inception_block((branch1): Sequential((0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True))(branch2): Sequential((0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch3): Sequential((0): Conv2d(512, 24, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(24, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch4): Sequential((0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)(1): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(3): ReLU(inplace=True)))(inception4d): inception_block((branch1): Sequential((0): Conv2d(512, 112, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True))(branch2): Sequential((0): Conv2d(512, 144, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(144, 288, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(4): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch3): Sequential((0): Conv2d(512, 32, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch4): Sequential((0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)(1): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(3): ReLU(inplace=True)))(inception4e): inception_block((branch1): Sequential((0): Conv2d(528, 256, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True))(branch2): Sequential((0): Conv2d(528, 160, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(160, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(4): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch3): Sequential((0): Conv2d(528, 32, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(32, 128, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch4): Sequential((0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)(1): Conv2d(528, 128, kernel_size=(1, 1), stride=(1, 1))(2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(3): ReLU(inplace=True)))(maxpool4): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)(inception5a): inception_block((branch1): Sequential((0): Conv2d(832, 256, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True))(branch2): Sequential((0): Conv2d(832, 160, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(160, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(4): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch3): Sequential((0): Conv2d(832, 32, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(32, 128, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch4): Sequential((0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)(1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1))(2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(3): ReLU(inplace=True)))(inception5b): Sequential((0): inception_block((branch1): Sequential((0): Conv2d(832, 384, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True))(branch2): Sequential((0): Conv2d(832, 192, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(4): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch3): Sequential((0): Conv2d(832, 48, kernel_size=(1, 1), stride=(1, 1))(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU(inplace=True)(3): Conv2d(48, 128, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(5): ReLU(inplace=True))(branch4): Sequential((0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)(1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1))(2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(3): ReLU(inplace=True)))(1): AvgPool2d(kernel_size=7, stride=1, padding=0)(2): Dropout(p=0.4, inplace=False))(classifier): Sequential((0): Linear(in_features=1024, out_features=1024, bias=True)(1): ReLU()(2): Linear(in_features=1024, out_features=4, bias=True)(3): Softmax(dim=1))
)

2.4 训练模型

2.4.1 编写训练函数

# 训练循环
def train(dataloader, model, loss_fn, optimizer):size = len(dataloader.dataset)  # 训练集的大小num_batches = len(dataloader)   # 批次数目, (size/batch_size，向上取整)train_loss, train_acc = 0, 0  # 初始化训练损失和正确率for X, y in dataloader:  # 获取图片及其标签X, y = X.to(device), y.to(device)# 计算预测误差pred = model(X)          # 网络输出loss = loss_fn(pred, y)  # 计算网络输出pred和真实值y之间的差距，y为真实值，计算二者差值即为损失# 反向传播optimizer.zero_grad()  # grad属性归零loss.backward()        # 反向传播optimizer.step()       # 每一步自动更新# 记录acc与losstrain_acc  += (pred.argmax(1) == y).type(torch.float).sum().item()train_loss += loss.item()train_acc  /= sizetrain_loss /= num_batchesreturn train_acc, train_loss

2.4.2 编写测试函数

def test(dataloader, model, loss_fn):size = len(dataloader.dataset)  # 训练集的大小num_batches = len(dataloader)   # 批次数目, (size/batch_size，向上取整)test_loss, test_acc = 0, 0  # 初始化测试损失和正确率# 当不进行训练时，停止梯度更新，节省计算内存消耗# with torch.no_grad():for imgs, target in dataloader:  # 获取图片及其标签with torch.no_grad():imgs, target = imgs.to(device), target.to(device)# 计算误差tartget_pred = model(imgs)          # 网络输出loss = loss_fn(tartget_pred, target)  # 计算网络输出和真实值之间的差距，targets为真实值，计算二者差值即为损失# 记录acc与losstest_loss += loss.item()test_acc  += (tartget_pred.argmax(1) == target).type(torch.float).sum().item()test_acc  /= sizetest_loss /= num_batchesreturn test_acc, test_loss

2.4.3 正式训练

import copyoptimizer = torch.optim.Adam(model.parameters(), lr = 1e-4)
loss_fn = nn.CrossEntropyLoss() #创建损失函数epochs = 40train_loss = []
train_acc = []
test_loss = []
test_acc = []best_acc = 0 #设置一个最佳准确率，作为最佳模型的判别指标if hasattr(torch.cuda, 'empty_cache'):torch.cuda.empty_cache()for epoch in range(epochs):model.train()epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, optimizer)#scheduler.step() #更新学习率（调用官方动态学习率接口时使用）model.eval()epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)#保存最佳模型到best_modelif epoch_test_acc > best_acc:best_acc = epoch_test_accbest_model = copy.deepcopy(model)train_acc.append(epoch_train_acc)train_loss.append(epoch_train_loss)test_acc.append(epoch_test_acc)test_loss.append(epoch_test_loss)#获取当前的学习率lr = optimizer.state_dict()['param_groups'][0]['lr']template = ('Epoch: {:2d}. Train_acc: {:.1f}%, Train_loss: {:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}, Lr: {:.2E}')print(template.format(epoch+1, epoch_train_acc*100, epoch_train_loss, epoch_test_acc*100, epoch_test_loss, lr))PATH = './J7_best_model.pth'
torch.save(model.state_dict(), PATH)print('Done')

输出结果如下所示：

2.5 结果可视化

import matplotlib.pyplot as plt
#隐藏警告
import warnings
warnings.filterwarnings("ignore")               #忽略警告信息
plt.rcParams['font.sans-serif']    = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False      # 用来正常显示负号
plt.rcParams['figure.dpi']         = 100        #分辨率epochs_range = range(epochs)plt.figure(figsize=(12, 3))
plt.subplot(1, 2, 1)plt.plot(epochs_range, train_acc, label='Training Accuracy')
plt.plot(epochs_range, test_acc, label='Test Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_loss, label='Training Loss')
plt.plot(epochs_range, test_loss, label='Test Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

输出结果显示如下：

3 总结

大部分流行的CNN是将网络的卷积层堆叠的越来越多，网络越来越深，同时channel越来越宽，网络越来越宽，以此来希望提取更高层的特征，从而得到更好的性能。但单纯的网络堆叠和加宽会带来副作用，包括梯度爆炸和数据量剧增而导致的训练困难的问题等。而Inception的提出，改善了此种现象。

Inception是用多路分支来并行采用不同的卷积核大小，来提取不同大小感受野所代表的特征。这种分支结构，将单路改变为多路，并行计算，使得网络运行速度更快。而不同大小的卷积核，则代表在不同大小感受野的范围内提取的特征，使得网络可以同时“看到”该位置不同范围的特征，通过后续的concate操作，将不同大小感受野的特征融合起来，综合该位置不同范围的特征。其解读思想更接近于人类的解读方式。

同时，为减少参数量，在分支中，使用1x1卷积将channel维度进行降维，提取特征后再次使用1x1卷积进行channel维度的回升，看似繁琐，却将参数量大大降低。而且，这样的操作，也在无形中增加了网络的深度，提取了更高维的特征。这种降维操作类似于将一个大矩阵转化为一个小矩阵，转化的过程中会提取大矩阵的“精华”，去除冗余信息。而升维操作则类似于将小矩阵又转化为原始大小的大矩阵，方便不同分支的特征融合。