titile | Light-Head R-CNN: In Defense of Two-Stage Object Detector |
---|---|
url | https://arxiv.org/pdf/1711.07264.pdf |
动机 | two-stage由于ROI处密集计算和heavy-head design影响运行速度(Faster RCNN:两层全卷积;R-FCN:large score maps) |
内容 |
Light-Head R-CNN: a thin feature map and a cheap R-CNN subnet (pooling and single fully-connected layer)。 R-CNN subnet and ROI warping 1、Faster RCNN: two large fully connected layers提高分类性能。 proposals较多时,计算密集。 global average pooling减少第一个全连接层的计算会影响空间定位。 2、R-FCN: large score map:channel数为classes × p × p。 computation-free R-CNN subnet:position-sensitive pooling后直接pooling进行prediction,不如Faster RCNN RoI-wise性能好。 3、全连接层的computation and memory cost主要由于ROI之后channel数较多。 Thin feature maps for RoI warping 1、thin feature maps :提高准确性,减少训练和inference的内存和计算量。 2、 PSRoI pooling:更多计算增强R-CNN,减少通道。 3、 RoI pooling:减少R-CNN开销,不使用Global Average Pooling,提高性能。 Light-Head R-CNN for Object Detection: L:large backbone network,S:small backbone network Basic feature extractor: L:ResNet 101。 S: Xception-like small base model。 Thin feature maps L:Cin = C5;k:15; Cmid = 64(S), Cmid = 256(L);Cout = 10×p×p; large kernel,更大的感受野,feature maps we pooled on are more powerful。 1、single 全连接层(2048 channels (no dropout)) 2、两个同级的全连接层预测RoI分类和回归(4 channels)。 3、RoI warping:Light- Head R-CNN achieve remarkable results, while keeping the efficiency。 RPN: 1、three aspect ratios{1:2, 1:1, 2:1} and five scales {322, 642, 1282, 2562, 5122}。 2、NMS:0.7。 |
实验 |
Implementation Details: backbone:initialized based on the pretrained ImageNet base model; pooling size:7。 each image has 2000/1000 RoIs for training/testing。 Ablation Experiments: Baselines 1、resize:[800, 1200] 2、 double regression loss 3、256 samples ranked based on the loss for backpropagation,2000 RoIs per image for training and 1000 RoIs for testing 图4与original R-FCN不同(为了验证减少channel 的影响): 1、channel由3969 (81 × 7 × 7) 减少到490 (10 × 7 × 7) for PSRoI pooling 2、 修改channel数,无法直接为prediction(81类),加全连接层prediction。 RoI pooling替换PSRoI pooling,增加0.3,RoI pooling更多features (49x),用计算量换精度。 Large separable convolution: S:xception backbone 1、tiny xception替换Resnet-101。 2、去掉atrous algorithm,与small backbone比计算量大。 3、RPN convolution to 256 channels(与origin比减半)。 4、large separable convolution:kernel size = 15,Cmid = 64,Cout = 490 (10 × 7 × 7),middle channels非常小,所以大kernel size也可以做到高效。 5、PSPooling with alignment:pooling后channel减少k × k倍,如果使用RoI-align效果会更好。 |
思考 | |
补充 | xception中reps代表block的中relu-SeparableConv2d-bn循环的数目,下面是xception中的block代码(非Light-Head R-CNN作者源码),stride=2是最后通过 MaxPool2d实现的。代码 |
class Block(nn.Module):
def __init__(self,in_filters,out_filters,reps,strides=1,start_with_relu=True,grow_first=True):
super(Block, self).__init__()
if out_filters != in_filters or strides!=1:
self.skip = nn.Conv2d(in_filters,out_filters,1,stride=strides, bias=False)
self.skipbn = nn.BatchNorm2d(out_filters)
else:
self.skip=None
rep=[]
filters=in_filters
if grow_first:
rep.append(nn.ReLU(inplace=True))
rep.append(SeparableConv2d(in_filters,out_filters,3,stride=1,padding=1,bias=False))
rep.append(nn.BatchNorm2d(out_filters))
filters = out_filters
for i in range(reps-1):
rep.append(nn.ReLU(inplace=True))
rep.append(SeparableConv2d(filters,filters,3,stride=1,padding=1,bias=False))
rep.append(nn.BatchNorm2d(filters))
if not grow_first:
rep.append(nn.ReLU(inplace=True))
rep.append(SeparableConv2d(in_filters,out_filters,3,stride=1,padding=1,bias=False))
rep.append(nn.BatchNorm2d(out_filters))
if not start_with_relu:
rep = rep[1:]
else:
rep[0] = nn.ReLU(inplace=False)
if strides != 1:
rep.append(nn.MaxPool2d(3,strides,1))
self.rep = nn.Sequential(*rep)
def forward(self,inp):
x = self.rep(inp)
if self.skip is not None:
skip = self.skip(inp)
skip = self.skipbn(skip)
else:
skip = inp
x+=skip
return x