EMD(搬土距离)(Earth Mover's Distance)
参考:
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/RUBNER/emd.htm
https://www.cnblogs.com/jackyzzy/p/3314667.html
http://blog.csdn.net/wangdonggg/article/details/35280735
详细的数据推导: http://blog.csdn.net/zhangping1987/article/details/25368183
Earth Mover's Distance,是2000年IJCV期刊文章《The Earth Mover's Distance as a Metric for Image Retrieval》基于运输问题的效率 提出的一种直方图相似度量。 它是归一化的从一个分布变为另一个分布的最小代价, 可以用来测量两个分布(multi-dimensional distributions)之间的距离。
EMD需要求解运输问题,其运算复杂度较高,平均而言至少是二次方级别。但是它作为距离函数,有一个非常好的特点是存在下界——两个分布的质心之间的距离,因此在粗略计算时,可以考虑用分布质心之间的距离代替EMD。
两个分布:
(位置, 权重)
P = {(p1, wp1), (p2, wp2), ..., (pm, wpm)}
Q = {(q1, wq1), (q2, wq2), ..., (qn, wqn))}
D = [dij] 表示位置 pi 到 位置 qj的距离
F = [fij] 表示权重 wpi 到 wqj 的权重
代价函数:
WORK(P, Q, F) = sum(dij * fij for i in range(1:m) for j in range(1:n))
限制条件:
fij > 0
sum(fij for j in range(1:n)) <= wpi 从i处搬出来的重量不能超过wpi所有的总和
sum(fij for i in range(1:m)) <= wqj 搬到j处的重量不能超过wqj能存放的总重量
WORK = min(sum(wpi for i in range(1:m)), sum(wqj for j in range(1:n)))
当仓库总重量和工厂货物总重量不一致时, 才需要限制第四个条件
动态规划的问题:
F是线性优化问题的最优解,earth mover's distance 定义为归一化后的WORK:
EMD(P, Q) = WORK(P, Q, F)
Divide
sum(fij for i in range(1:m) for j in range(1:n))
为了使EMD的值不随着运输总量的大小变化而变化, 引入归一化因子-总的运输量, (The normalization factor is introduced in order to avoid favoring smaller signatures in the case of partial matching.)
EMD距离ground distance的优势:
Naturally extends the notion of a distance between single elements to that of a distance between sets, or distributions, of elements.
将距离的概念扩展到集合/概率分布Can be applied to the more general variable-size signatures, which subsume histograms. Signatures are more compact, and the cost of moving ``earth'' reflects the notion of nearness properly, without the quantization problems of most other measures.
Allows for partial matches in a very natural way. This is important, for instance, for image retrieval and in order to deal with occlusions and clutter.
允许计算部分的相似度, 对于图像检索这很重要Is a true metric if the ground distance is metric and if the total weights of two signatures are equal. This allows endowing image spaces with a metric structure.
Is bounded from below by the distance between the centers of mass of the two signatures when the ground distance is induced by a norm. Using this lower bound in retrieval systems significantly reduced the number of EMD computations.
有下界-两个分布的质心之间的距离, 粗略估计可以减少大量计算Matches perceptual similarity better than other measures, when the ground distance is perceptually meaningful. This was shown by [2] for color- and texture-based image retrieval.
分布在视觉上有意义时, EMD距离比其它距离估计更有好, 可以参照基于色彩以及纹理的图像检索
使用:
C代码包: emd.h, emd.c, emd.i
OpenCV:实现了EMD api,
pip install --upgrade setuptools
pip install numpy Matplotlib
pip install opencv-python
import numpy as np
import cv
#p、q是两个矩阵,第一列表示权值,后面三列表示直方图或数量
p=np.asarray([[0.4,100,40,22],
[0.3,211,20,2],
[0.2,32,190,150],
[0.1,2,100,100]],np.float32)
q=np.array([[0.5,0,0,0],
[0.3,50,100,80],
[0.2,255,255,255]],np.float32)
pp=cv.fromarray(p)
qq=cv.fromarray(q)
emd=cv.CalcEMD2(pp,qq,cv.CV_DIST_L2)