Life sucks, but you're gonna love it.

0%

干货 | 各种各样的distance

Mahalanobis distance (马氏距离)

1
The Mahalanobis distance is a measure of the distance between a point P and a distribution D, introduced by P. C. Mahalanobis in 1936. It is a multi-dimensional generalization of the idea of measuring how many standard deviations away P is from the mean of D. This distance is zero if P is at the mean of D, and grows as P moves away from the mean along each principal component axis. If each of these axes is re-scaled to have unit variance, then the Mahalanobis distance corresponds to standard Euclidean distance in the transformed space. The Mahalanobis distance is thus unitless and scale-invariant, and takes into account the correlations of the data set.

Mahalanobis distance 是用来测量一个样本 $\vec x = (x_1, x_2, x_3, …,x_N)^T$和已有的一个样本集之间的相似度,因此需要用的该样本集的均值$\vec \mu = (\mu_1, \mu_2, \mu_3,…,\mu_N)^T$和协方差矩阵$S$(当样本的变量不唯一时)。

$D_M(\vec{x}) = \sqrt{(\vec x - \vec\mu )^TS^{-1}(\vec x - \vec \mu)}$

或者测量服从相同分布的两个样本之间的距离:

$d(\vec x, \vec y) = \sqrt{(\vec x - \vec y)^TS^{-1}(\vec x - \vec y)}$

如果协方差矩阵是identity matrix (对角线上的值为1),那么此时的马氏距离和欧氏距离相等。