Spearman correlation

Spearman rank correlation

The Spearman correlation coefficient (or Spearman’s $\rho$) measures rank correlation between two variables.

Assuming monotonicity, the Spearman’s $\rho$ will take values between $-1$ and $1$, representing completely opposite or identical ranks, respectively1.

Due to the dependance on ranks, the Spearman’s $\rho$ is used for ordinal value, although discrete and continous values are possible.

If we consider a dataset of size $n$, and $X_i, Y_i$ as the scores, we can then calculate the ranks as $\operatorname{R}({X_i}), \operatorname{R}({Y_i})$, and $\rho$ as

$$ r_s = \rho_{\operatorname{R}(X),\operatorname{R}(Y)} = \frac{\operatorname{cov}(\operatorname{R}(X), \operatorname{R}(Y))} {\sigma_{\operatorname{R}(X)} \sigma_{\operatorname{R}(Y)}}, $$

Here $\rho$ is the Pearson correlation coefficient, but applied to the rank variables, $\operatorname{cov}(\operatorname{R}(X), \operatorname{R}(Y))cov(R(X),R(Y))$ is the covariance of the rank variables, $\sigma_{\operatorname{R}(X)}$ and $\sigma_{\operatorname{R}(Y)}$ are the standard deviations of the rank variables.

If all the ranks are distinct integers, the simplified form can be applied

$$ r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}, $$

where $d_i = \operatorname{R}(X_i) - \operatorname{R}(Y_i)$ is the difference between the two ranks of each observation, $n$ is the number of observations.

  1. Assuming no repeated ranks. ↩︎