Light-Head R-CNN

titile	Light-Head R-CNN: In Defense of Two-Stage Object Detector
url	https://arxiv.org/pdf/1711.07264.pdf
动机	two-stage由于ROI处密集计算和heavy-head design影响运行速度（Faster RCNN：两层全卷积；R-FCN：large score maps）
内容	Light-Head R-CNN： a thin feature map and a cheap R-CNN subnet (pooling and single fully-connected layer)。 R-CNN subnet and ROI warping R-CNN subnet ： 1、Faster RCNN： two large fully connected layers提高分类性能。 proposals较多时，计算密集。 global average pooling减少第一个全连接层的计算会影响空间定位。 2、R-FCN： large score map：channel数为classes × p × p。 computation-free R-CNN subnet：position-sensitive pooling后直接pooling进行prediction，不如Faster RCNN RoI-wise性能好。 3、全连接层的computation and memory cost主要由于ROI之后channel数较多。 Thin feature maps for RoI warping 1、thin feature maps ：提高准确性，减少训练和inference的内存和计算量。 2、 PSRoI pooling：更多计算增强R-CNN，减少通道。 3、 RoI pooling：减少R-CNN开销，不使用Global Average Pooling，提高性能。 Light-Head R-CNN for Object Detection： L：large backbone network，S：small backbone network Basic feature extractor： L：ResNet 101。 S： Xception-like small base model。 Thin feature maps L：Cin = C5；k：15； Cmid = 64（S）, Cmid = 256（L）；Cout = 10×p×p； large kernel，更大的感受野，feature maps we pooled on are more powerful。 R-CNN subnet： 1、single 全连接层（2048 channels (no dropout)） 2、两个同级的全连接层预测RoI分类和回归（4 channels）。 3、RoI warping：Light- Head R-CNN achieve remarkable results, while keeping the efficiency。 RPN： 1、three aspect ratios{1:2, 1:1, 2:1} and five scales {32², 64², 128², 256², 512²}。 2、NMS：0.7。
实验	Implementation Details： backbone：initialized based on the pretrained ImageNet base model； pooling size：7。 each image has 2000/1000 RoIs for training/testing。 Ablation Experiments： Baselines 1、resize：[800, 1200] 2、 double regression loss 3、256 samples ranked based on the loss for backpropagation，2000 RoIs per image for training and 1000 RoIs for testing Thin feature maps for RoI warping 图4与original R-FCN不同（为了验证减少channel 的影响）： 1、channel由3969 (81 × 7 × 7) 减少到490 (10 × 7 × 7) for PSRoI pooling 2、修改channel数，无法直接为prediction（81类），加全连接层prediction。可以增加feature pyramid network，original R-FCN由于memory consumption限制很难增加（Conv2） RoI pooling替换PSRoI pooling，增加0.3，RoI pooling更多features (49x)，用计算量换精度。 Large separable convolution： R-CNN subnet： Light-Head R-CNN: High Accuracy： Light-Head R-CNN: High Speed： S：xception backbone fast inference speed改动： 1、tiny xception替换Resnet-101。 2、去掉atrous algorithm，与small backbone比计算量大。 3、RPN convolution to 256 channels（与origin比减半）。 4、large separable convolution：kernel size = 15,Cmid = 64,Cout = 490 (10 × 7 × 7)，middle channels非常小，所以大kernel size也可以做到高效。 5、PSPooling with alignment：pooling后channel减少k × k倍，如果使用RoI-align效果会更好。
思考
补充	xception中reps代表block的中relu-SeparableConv2d-bn循环的数目，下面是xception中的block代码（非Light-Head R-CNN作者源码），stride=2是最后通过 MaxPool2d实现的。代码

class Block(nn.Module):
    def __init__(self,in_filters,out_filters,reps,strides=1,start_with_relu=True,grow_first=True):
        super(Block, self).__init__()

        if out_filters != in_filters or strides!=1:
            self.skip = nn.Conv2d(in_filters,out_filters,1,stride=strides, bias=False)
            self.skipbn = nn.BatchNorm2d(out_filters)
        else:
            self.skip=None

        rep=[]

        filters=in_filters
        if grow_first:
            rep.append(nn.ReLU(inplace=True))
            rep.append(SeparableConv2d(in_filters,out_filters,3,stride=1,padding=1,bias=False))
            rep.append(nn.BatchNorm2d(out_filters))
            filters = out_filters

        for i in range(reps-1):
            rep.append(nn.ReLU(inplace=True))
            rep.append(SeparableConv2d(filters,filters,3,stride=1,padding=1,bias=False))
            rep.append(nn.BatchNorm2d(filters))

        if not grow_first:
            rep.append(nn.ReLU(inplace=True))
            rep.append(SeparableConv2d(in_filters,out_filters,3,stride=1,padding=1,bias=False))
            rep.append(nn.BatchNorm2d(out_filters))

        if not start_with_relu:
            rep = rep[1:]
        else:
            rep[0] = nn.ReLU(inplace=False)

        if strides != 1:
            rep.append(nn.MaxPool2d(3,strides,1))
        self.rep = nn.Sequential(*rep)

    def forward(self,inp):
        x = self.rep(inp)

        if self.skip is not None:
            skip = self.skip(inp)
            skip = self.skipbn(skip)
        else:
            skip = inp

        x+=skip
        return x