Spearman correlation

Spearman rank correlation

The Spearman correlation coefficient (or Spearman’s \(\rho\)) measures rank correlation between two variables.

Assuming monotonicity, the Spearman’s \(\rho\) will take values between \(-1\) and \(1\), representing completely opposite or identical ranks, respectively1.

Due to the dependance on ranks, the Spearman’s \(\rho\) is used for ordinal value, although discrete and continous values are possible.

If we consider a dataset of size \(n\), and \(X_i, Y_i\) as the scores, we can then calculate the ranks as \(\operatorname{R}({X_i}), \operatorname{R}({Y_i})\), and \(\rho\) as

\[ r_s = \rho_{\operatorname{R}(X),\operatorname{R}(Y)} = \frac{\operatorname{cov}(\operatorname{R}(X), \operatorname{R}(Y))} {\sigma_{\operatorname{R}(X)} \sigma_{\operatorname{R}(Y)}}, \]

Here \(\rho\) is the Pearson correlation coefficient, but applied to the rank variables, \(\operatorname{cov}(\operatorname{R}(X), \operatorname{R}(Y))cov(R(X),R(Y))\) is the covariance of the rank variables, \(\sigma_{\operatorname{R}(X)}\) and \(\sigma_{\operatorname{R}(Y)}\) are the standard deviations of the rank variables.

If all the ranks are distinct integers, the simplified form can be applied

\[ r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}, \]

where \(d_i = \operatorname{R}(X_i) - \operatorname{R}(Y_i)\) is the difference between the two ranks of each observation, \(n\) is the number of observations.


  1. Assuming no repeated ranks. ↩︎