[AI] 4 卷积神经网络CNN （下）

一、过拟合overfitting

一般来说，模型的参数越多，模型就可以拟合更多的函数类型。但是，如果数据量太少不足以限制模型，那么模型的拟合结果就会过于精确，以至于丢失了真实的数据特性，造成过拟合

overfitting, from C.Bishop, Pattern Recognition and Machine Learning

以曲线拟合为例，，随着模型参数数量的增加，过拟合的风险也在增加

二、深度卷积网络Deep Convolutional Neural Network（CNNs）

常用数据集

MNIST：60,000 images of hand written digits; deep convolutional neural networks can learn to recognise them with as low as 0.23% error rate (Ciresan et al. 2012)

sample of MNIST dataset

ImageNet：1M+ images; 1000 classes; exemplar deep convolutional neural networks recognise them with an error of 2.25%

sample of ImageNet, from cs.stanford.edu

应用

图片分类（Classification）

[Krizhevsky et al. 2012]

语音分割（Semantic segmentation）

[Badrinarayanan et al. 2017]

基于图片内容的问答（Image-based question answering）

from http://conviqa.noahlab.com.hk/project.html

添加说明字幕（Image captioning）

Karpathy&Fei-Fei 15

运动检测（Action Recognition）

Hou et al. 17

三、经典CNN网络结构

AlexNet (Winner of ILSVRC 2012, 1000 classes)

AlexNet, Krizhevsky et al. 2012
VGG16 (Stack of small conv layers with 3*3 filter, the number of conv filters: 64 -> 128 -> 256 -> 512 -> 512)

VGG16, Simonyan and Zisserman, 2014

多卷积核🆚单卷积核
A stack of 4 Conv layers with filter size 3x3 has equivalent receptive field size to a single Conv layer of 9x9 filters (assume no padding and stride = 1)

figure from Lboro University sildes

多卷积核优势
1）参数更少，过拟合风险更小

2）参数更少，计算量更少
3）更多激活层，模型会具有更多非线性特性，更强的表示能力

GoogLeNet

GoogLeNet, Szegedy et al. 2015

Naïve inception module, Szegedy et al. 2015

解决方案：缩小输入维度

GoogLeNet, Szegedy et al. 2015

Inception module with bottleneck layers, Szegedy et al. 2015
ResNet (Very deep network with 152 layers)

背景：如果只是简单的堆叠卷积层，则模型的性能不会随着深度越深，性能提高，因此提出残差网络（Residual Block），实验表明，残差网络可以有效提升深度网络性能

通过计算可知，在残差网络中，根据向后传播的链式法则 $\frac{\partial Out}{\partial In}=\frac{\partial(B(A(x))+x)}{\partial x}=1+\frac{\partial B}{\partial A}\frac{\partial A}{\partial x}$

image.png

从中可以看出，即使此时，即使在Out-B-A的后向传播中出现梯度衰减的情况，Out的梯度依旧能够传到In，实现梯度的跨层传播

Densely-connected network

comparison between ResNet and Densely-connected network, from https://www.youtube.com/watch?v=xVhD2OBqoyg

- 优点
增快拟合速度，反向传播更健壮，梯度消失可能性更小
从多层中学习特性，信息量更丰富
重复利用feature maps，一定程度上减少卷积层

缺点
大量内存消耗，需要保存所有中间的feature maps

四、微调（Fine tuning）

通过现有的、已经训练好的模型，并修改其中的一些层来达到训练处新的模型的方法
Example: Fine tune AlexNet for clothing classification
Given: AlexNet pre-trained on ImageNet dataset
Aim: fine-tune it to classify clothing (shirt, jeans, etc)

Case 1: when we have very few data

Note: if the data is very few, it is worth to trying if a simple linear classifier (e.g. SVM) works

Note 2: if the pre-trained data is very different from the data in your task, it might be better to train a linear classifier (e.g. SVM) on shallower level features

Case 2: when we have more data

Note: you could free more layers to see if the performance gets better

Case 3: when we have very large amount of data

We could train the network from scratch（从头开始重新训练）

The performance may still benefit from initialization with pre-trained weights

注意⚠️：训练过程中容易出现过拟合

figure from Lboro University slides

解决过拟合常用方式：

增加数据（Data augmentation）：对于图像来说，可以放大缩小、旋转、增加噪声等
设置Dropout

figure from Srivastava et al., 2014
设置Early stopping

figure from Lboro University slides
减少模型中的参数

五、自动编码器（Autoencoders）

自动编码器是一种数据压缩、降维无监督学习算法。它的目标是理解数据中的主要特性并且以压缩后的形式表现出来