inception V2/3(2015)

news/2024/11/22 19:04:15/

Inception v2 and Inception v3 were presented in the same paper.

** 外网blog**

note V1-V4
https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
https://hackmd.io/@bouteille/SkD5Xd4DL

两个要点:

  1. 引入了Batch-Normalization(批量标准化);
  2. 将原Inception模块中的5∗5卷积层用两层的3∗3卷积来代替。

卷积核得设计与研究,用小卷积叠加代替大卷积核,2个33=1个55 ,3个33==1个77

作者通过因式分解卷积核,和积极的正则化增加计算效率,googlNet使用手工设计特征,AlextNet远远比不上。即使是googlNet在mobile等有效资源的情况下,减少参数量后,计算仍然十分复杂。

结论:

1.提出来几种提升卷积神经网络的原则,引导我们设计质量很高,计算代价较小的网络结构

2.在single crop评价中,高质量inceptionV3取得得最好结果

3.研究了怎样分解卷积尺寸,降低神经网络内部维度的同时保持相对较低得训练代价和较高得质量。论文表明,较低的参数计数和额外的正则化与批量标准化辅助分类器和标签平滑相结合,允许在相对较小的训练集上训练高质量的网络。

通用设计原则:

  1. 避免达到表征能力的瓶颈,在输入到输出的过程中,size一般越来越小。信息内容不能仅仅通过表征的维度来评估,因为它抛弃了相关结构等重要因素;维度仅仅提供了信息内容的大致评估;

  2. 高维表征更容易在本地网络处理。在卷积网络中每个格子上增加激活函数可以得到更多分离的特征,加快训练速度。

  3. 高维数据处理以前,先降低维度,推测原因是相邻单元之间的强相关性会导致在降维过程中的损失大大减少。

联想到1*1卷积得应用

  1. 平衡网络深度和宽度,一般同步增加。

以上这些原则不能直观增加网络性能,只在模糊情况下使用。

对称分解

如图两个33可以替换一个55,这种分解大大减少了计算参数量,增加了深度和非线性

在这里插入图片描述

不对称分解

使用13后接31卷积来代替3*3卷积,如下图figure7中画红线位置,这种设置也减少了参数量

进一步考虑,一个nn的卷积可以分解为1n+n*1卷积,然鹅,作者提出在较大尺寸的卷积核这种分解才有用。如下figure6所示

In practice, we have found that employing this factorization does not work well on early layers, but it gives very good results on medium grid-sizes (On m*m feature maps, where m ranges between 12 and 20). On that level, very good results can be achieved by using 1 *7 convolutions followed
by 7 * 1 convolutions.

  • Reduce representational bottleneck. The intuition was that, neural networks perform better when convolutions didn’t alter the dimensions of the input drastically. Reducing the dimensions too much may cause loss of information, known as a “representational bottleneck”

辅助分类器

  • The filter banks in the module were expanded (made wider instead of deeper) to remove the representational bottleneck. If the module was made deeper instead, there would be excessive reduction in dimensions, and hence loss of information. This is illustrated in the below image.

在这里插入图片描述

有趣的是,我们发现辅助分类器在训练早期并没有改善收敛性:在两个模型达到高精度之前,有侧头和无侧头网络的训练过程看起来几乎相同。接近训练结束时,带有辅助分支的网络开始超过没有任何辅助分支的网络的精度,并达到略高的平台。

Grid size reduction

  • Traditionally, convolutional networks use some pooling before convolution operations to reduce the gride size of the feature maps. Problem is, it can introduce a representational bottleneck.
  • The authors think that increasing the number of filters (expand the filter bank) remove the representational bottleneck. This is achieved by the inception module.

观察发现两张图发现先pooling还是后pooling的顺序不同,左边的图先池化然后增加滤波器组(filter bank),这么做引入了a representational bottleneck。

直观的解释就是,一个人在学习的时候应该先进行通识教育还是专业教育,显然我们应该先进行通识教育,然后选择自己喜欢的专业学习。因此左边的图first reduce ,可能丢失重要信息,在随后得layers中导致特征表达局限性。

然而,右边的图计算代价很高,作者提出了另一种方法来降低代价同时消除representational bottleneck。

  • (by using 2 parallel stride 2 pooling/convolution blocks).

inception V2

The above three principles were used to build three different types of inception modules

Whole network is 42 layers deep, computational cost is 2.5 times higher than GoogLeNet.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DTKIRpsU-1647155574696)(https://secure2.wostatic.cn/static/6zspTf6kF5pTUqUHfJH9Jr/image.png)]

Model Regularization via Label Smoothing

Label smoothing

  • In CNN, the label is a vector. If you have 3 class, the one-hot labels are [0, 0, 1] or [0, 1, 0] or [1, 0, 0]. Each of the vector stands for a class at the output layer. Label smoothing, in my understanding, is to use a relatvely smooth vector to represent a ground truth label. Say [0, 0, 1] can be represented as [0.1, 0.1, 0.8]. It is used when the loss function is the cross entropy function.
  • According to the author:
    • First, it (using unsmoothed label) may result in overfitting: if the model learns to assign full probability to the ground-truth label for each training example, it is not guaranteed to generalize.
    • Second, it encourages the differences between the largest logit and all others to become large, and this, combined with the bounded gradient, reduces the ability of the model to adapt. Intuitively, this happens because the model becomes too confident about its predictions.
  • They claim that by using label smoothing, the top-1 and top-5 error rate are reduced by 0.2%.

#Inception v3


## The Premise*   The authors noted that the **auxiliary classifiers** didn’t contribute much until near the end of the training process, when accuracies were nearing saturation. They argued that they function as **regularizes**, especially if they have BatchNorm or Dropout operations.
*   Possibilities to improve on the Inception v2 without drastically changing the modules were to be investigated.## The Solution*   **Inception Net v3** incorporated all of the above upgrades stated for Inception v2, and in addition used the following:1.  RMSProp Optimizer.
2.  Factorized 7x7 convolutions.
3.  BatchNorm in the Auxillary Classifiers.
4.  Label Smoothing (A type of regularizing component added to the loss formula that prevents the network from becoming too confident about a class. Prevents over fitting).

inception V2使用batch Normalization
inception V3 修改了inception块:

1.替换55为多个33卷积层

2.替换55 为17和7*1 卷积层

3.替换33为13和3*1

inception V4 使用残差(resnet)连接


http://www.ppmy.cn/news/477503.html

相关文章

YOLO v2实现图像目标检测

目录 1、作者介绍2、算法简介3、环境配置4、代码实现4.1 数据准备4.2 完整代码4.3 运行结果 常见问题总结 1、作者介绍 熊文博,男,西安工程大学电子信息学院,2020级硕士研究生,张宏伟人工智能课题组。 研究方向:机器视…

URAL 做题记录 V2

题号 标题 难度系数 算法 1100 Final Standings 50% 反复统计 1101 Robot in the field 30% 表达式求值 1102 Strange Dialog 60% 动态规划或语法图 1103 Pencils and Circles 65% 不错的几何问题 1104 Donk ask a woman about her age 55% 同余问题 1105 Observers coloring 7…

使用mp4v2将H264+AAC合成mp4文件

本文转载自:http://www.cnblogs.com/chutianyao/archive/2012/04/13/2446140.html 录制程序要添加新功能:录制CMMB电视节目,我们的板卡发送出来的是RTP流(H264视频和AAC音频),录制程序要做的工作是&#xf…

【ShuffleNet V2】《ShuffleNet V2:Practical Guidelines for Efficient CNN Architecture Design》

ECCV-2018 caffe 版代码:https://github.com/miaow1988/ShuffleNet_V2_pytorch_caffe/blob/master/shufflenet_v2_x1.0.prototxt caffe 代码可视化工具:http://ethereon.github.io/netscope/#/editor 文章目录 1 Background and Motivation2 Advantages…

三万字全面概述关于5G-V2X技术和应用

5G技术有望实现更快的网联链接、更低的延迟、更高的可靠性、更大的容量和更广的覆盖范围。希望依靠这些技术来实现车辆到一切(V2X)的通信,除了道路安全外,还能提高车辆的安全性和自动驾驶性能,节约能源和成本。车辆通信…

学习使用mp4v2-2.0.0 —— 1

本来想自己根据mp4的结构自己创建mp4文件的,但。。。还是先去找现有的解决方案看是否更好。 找到了一篇:http://www.ahlinux.com/embed/6770.html 然后到这里:https://launchpad.net/ubuntu/source/mp4v2/2.0.0~dfsg0-3 下载了mp4v2的源码…

靶机渗透练习55-digitalworld.local:MERCY v2

靶机描述 靶机地址:https://www.vulnhub.com/entry/digitalworldlocal-mercy-v2,263/ Description MERCY is a machine dedicated to Offensive Security for the PWK course, and to a great friend of mine who was there to share my sufferance with me. &…

m_map下载

转载自;https://www.eoas.ubc.ca/~rich/map.html Introduction GalleryGetting M_Map Release NotesUsers GuideExample Code Citation Acknowledgements Last changed 9/Jan/2019. Questions and comments to richeos.ubc.ca M_Map: A mapping package for Matla…