文章链接 https://arxiv.org/abs/1708.05237
贡献点:
1)proposing a scale-equitable face detection frame work to handle different scales of faces well.
2)improving the recall rate of small faces by a scale compensation anchor matching strategy.
3)reducing the false positive rate of small faces via a max-out background label.
这篇文章作者其实是在SSD网络结构的基础上针对人脸检测数据集的特点做了一些改进。
传统基于anchor检测方法的缺点:
Comparing with other methods, anchor-based detection methods are more robust in complicated scenes and their speed is invariant to object numbers. However, as indicated in [12],the performance of anchor-based detectors drop dramatically as the objects becoming smaller.
Anchor-based方法没有scale-invariant(尺度不变性).对大物体检测的好,对小物体不行。
没有scale-invariant原因:
1.Biased framework(不适当的网络结构)
(1)Firstly, the stride size of the lowest anchor-associated layer is too large (e.g.,8 pixels in [26] and 16 pixels in [38]), therefore small and medium faces have been highly squeezed on these layers and have few features for detection. Fig.1(a).
后面卷积层的步长变的很大,比如conv5_3的stride为16。这样会忽略掉一部分小物体。
(2)Secondly, small face, anchor scale and receptive field are mutual mismatch: anchor scale mismatches receptive field and both are too large to fit small face.see Fig.1(b).
Anchor的尺度对小目标设计的不合适。
2. Anchor matching strategy.
those faces whose scale distribute away from anchor scales can not match enough anchors, such as tiny and outer face in Fig.1(c), leading to their low recall rate.
因为anchor设计的问题,导致有些小脸没有足够多的anchor与其相匹配,故而降低了检测率。
3. Background from small anchors.
As illustrated in Fig.1(d), these small anchors lead to a sharp increase in the number of negative anchors on the background,bringing about many false positive faces.
若降低anchor的尺度(如在conv3_3加入小尺度的anchor),会大大增加负样本数量。
为改进传统方法存在的问题,本文的方法:
1.scale-equitable face detection framework:
从图3(a)可以看出理想感受野比实际感受野小很多。According to this theory, the anchor should be significantly smaller than theoretical receptive field in order to match the effective receptive field (see the specific example in Fig.3(b)).
As shown in the second and third column in Tab.1, the scales of our anchors are 4times its interval. We call it equal-proportion interval principle(illustrated in Fig.3(c)), which guarantees that different scales of anchor have the same density on the image, so that various scales face can approximately match the same number of anchors.
网络结构依旧沿用SSD的网络结构。因为原网络的anchor尺度设置有点大,所以作者重新设置了anchor的尺度。并且作者认为stride决定了anchor的间隔。所以设置每层stride的大小为每层anchor尺度的1/4.作者称其为equal-proportion interval principle
2.Scale compensation anchor matching strategy
为了使某些小物体有足够多的anchor与其相匹配,所以适当降低了阈值。
Stage one:We follow current anchor matching method but decrease threshold from 0:5 to 0:35 in order to increase the average number of matched anchors.
Stage Two:After stage one, some faces still do not match enough anchors, such as tiny and outer faces marked with the gray dotted curve in Fig.4(a). We deal with each of these faces as follow:
firstly picking out anchors whose jaccard overlap with this face are higher than 0:1, then sorting them to select top-N as matched anchors of this face. We set N as the average number from stage one.
3. Maxout background label
该方法是为了平衡负样本与正样本的比例 具体方法如下,但是没太明白。
we propose to apply a more sophisticated classification strategy on the lowest layer to handle the complicated background from small anchors. We apply the max-out background label for the conv3_3 detection layer. For each of the smallest anchors, we predict Nm(Nm is the maxout background label)scores for background label and then choose the highest as its final score, as illustrated in Fig.4(b).