# Distance metrics

## L-p metrics

### Manhattan distance (L1)

Given two vectors $$p$$ and $$q$$, such that

\begin{aligned} p &= \left(p_1, p_2, \dots,p_n\right) \\ q &= \left(q_1, q_2, \dots,q_n\right) \end{aligned}

we define the Manhattan distance as:

$d_1(p, q) = \|p - q\|_1 = \sum_{i=1}^n |p_i-q_i|$

### Euclidean distance (L2)

In general, for points given by Cartesian coordinates in $$n$$-dimensional Euclidean space, the distance is

$d(p,q)=\sqrt {(p_{1}-q_{1})^{2}+(p_{2}-q_{2})^{2}+\cdots +(p_{i}-q_{i})^{2}+\cdots +(p_{n}-q_{n})^{2}}$

## Cluster distances

### Within-cluster sum of squares (WCSS)

Given a set of observations ($$x_1, x_2,\dots,x_n$$), where each observation is a $$d$$-dimensional real vector, $$k$$-means clustering aims to partition the $$n$$ observations into $$k$$ ($$\leq n$$) sets $$S=\lbrace S_1, S_2, \dots, S_k\rbrace$$ so as to minimize the within-cluster sum of squares (WCSS) (i.e. variance). Formally, the objective is to find:

${\underset {\mathbf {S} }{\operatorname {arg\,min} }}\sum _{i=1}^{k}\sum _{\mathbf {x} \in S_{i}}\left\|\mathbf {x} -{\boldsymbol {\mu }}_{i}\right\|^{2}={\underset {\mathbf {S} }{\operatorname {arg\,min} }}\sum _{i=1}^{k}|S_{i}|\operatorname {Var} S_{i}$

where $$\mu_i$$ is the mean of points in $$S_i$$. This is equivalent to minimizing the pairwise squared deviations of points in the same cluster:

${\underset {\mathbf {S} }{\operatorname {arg\,min} }}\sum _{i=1}^{k}\,{\frac {1}{2|S_{i}|}}\,\sum _{\mathbf {x} ,\mathbf {y} \in S_{i}}\left\|\mathbf {x} -\mathbf {y} \right\|^{2}}$

The equivalence can be deduced from identity



Because the total variance is constant, this is equivalent to maximizing the sum of squared deviations between points in different clusters (between-cluster sum of squares, BCSS) which follows from the law of total variance.

### Dunn index

A full explanation is available at Dunn index.

### Gower distance

A full explanation with examples is available at Gower distance.