# Spearman correlation

## Spearman rank correlation

The Spearman correlation coefficient (or Spearman’s $$\rho$$) measures rank correlation between two variables.

Assuming monotonicity, the Spearman’s $$\rho$$ will take values between $$-1$$ and $$1$$, representing completely opposite or identical ranks, respectively1.

Due to the dependance on ranks, the Spearman’s $$\rho$$ is used for ordinal value, although discrete and continous values are possible.

If we consider a dataset of size $$n$$, and $$X_i, Y_i$$ as the scores, we can then calculate the ranks as $$\operatorname{R}({X_i}), \operatorname{R}({Y_i})$$, and $$\rho$$ as

$r_s = \rho_{\operatorname{R}(X),\operatorname{R}(Y)} = \frac{\operatorname{cov}(\operatorname{R}(X), \operatorname{R}(Y))} {\sigma_{\operatorname{R}(X)} \sigma_{\operatorname{R}(Y)}},$

Here $$\rho$$ is the Pearson correlation coefficient, but applied to the rank variables, $$\operatorname{cov}(\operatorname{R}(X), \operatorname{R}(Y))cov(R(X),R(Y))$$ is the covariance of the rank variables, $$\sigma_{\operatorname{R}(X)}$$ and $$\sigma_{\operatorname{R}(Y)}$$ are the standard deviations of the rank variables.

If all the ranks are distinct integers, the simplified form can be applied

$r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)},$

where $$d_i = \operatorname{R}(X_i) - \operatorname{R}(Y_i)$$ is the difference between the two ranks of each observation, $$n$$ is the number of observations.

1. Assuming no repeated ranks. ↩︎