Semantic Segmentation
- 语义分割: 即图像像素级别的分类
- 将图像分块,每一个区域代表有意义的物体
- 对每一个物体赋值一个物理标签
主要的应用
- 自动驾驶
- 医疗图像
DeepLab的主要思路
DeepLabV1与DeepLabV2
- 使用DCNN来分类并产生一个关于分割的一个粗糙的预测(平滑的,带有模糊的heat map)
- 通过条件随机场(CRF)重新调整结果
如图所示:
DCNN
带洞深度卷积
- Striding
- Pooling
CRF
此处主要涉及与DCNN的相关比较
- 首先DCNN将分类的准确性与定位的准确性进行了相关权衡
- DCNN所生成的热力图可以很好的预测分类结果以及物体的大致位置
- 精确的大纲效率比较低
然而CRF将像素之间的关系进行建模, 通过下面的方式:
- 相近的像素有更大的概率为同一个label
- CRF将对像素标签的赋值转化为每一个像素的概率
- 通过迭代的方式来调整结果直至收敛
效果如下图所示:
DeepLabV1
网络相关描述:
- DeepLab v1 is constructed by modifying VGG-16
- Fully connected layers of VGG-16 are converted to convolutional layers
- Subsampling is skipped after last two max-pooling layers
- Convolutional filters in the layers that follow pooling are modified to atrous
- Model weights of Imagenet-pretrained VGG-16 network are finetuned
网络结构图:
图像分割的结果:
DeepLabV2
网络相关描述
- Better segmentation of objects at multiple scales (using ASPP)
- Adapting ResNet image classification DCNN
- Learning rate policy
ASPP
之所以引入ASPP(Atrous Spatial Pyramid Pooling)的原因如下:
- 图片中物体的尺寸大小不一
- Computationally efficient scheme of resampling a given feature layer at multiple rates prior to convolution
- 使用多个并行的带洞卷积层来进行不同采样率的卷积
ASPP的一些细节以及提升的结果图:
网络的输出结果图:
DeepLabV2 & DeepLabV3
优点
- Speed: By virtue of the ‘atrous’ algorithm, dense DCNN operates at 8 fps, while fully-connected CRF requires 0.5 second
- Accuracy: state-of-the-art results achieved on several state-of-art datasets
- Simplicity: the system is composed of a cascade of two fairly wellestablished modules, DCNNs and CRFs
DeepLabV3
与前两个版本的变动:
- The proposed framework is general and could be applied to any network
- Several copies of the last ResNet block are duplicated, and arranged in cascade
- Batch normalization is included within ASPP
- CRF is not used
关于ASPP的变动:
- Batch normalization is included within ASPP
- As the sampling rate becomes larger, number of valid filter weights becomes smaller
- Global average pooling on last feature map of the model
ASPP的更新的一些细节:
DeepLabV3的最终效果:
DeepLabV3+
这里面使用一张图来概括最大的更新:
其较为详细的结构图如下:
另外Xception模型所作出的更改如图所示:
最终在PASCAL VOC 2012上的测试结果与其他方法比较结果下表:
最后,放出此模型最终的一些可视化展示:
[完]