# Correlation matrix

## Similarity

Let’s create two datasets, $$\mu_1$$ and $$\mu_2$$ such that

$\mu_i = \{x_1,\dots,x_n\} \sim \{\mathcal{U}_1(-1,1),\dots,\mathcal{U}_n(-1,1)\}$

We will use $$N=100$$ observations for a vector sized $$n=10$$.

import numpy as np
import pandas as pd
import scipy.stats as stats
from scipy.spatial.distance import squareform

n = 10
N = 100

np.random.seed(0)
mu_1 = np.random.normal(loc=0, scale=1, size=(N, n))
mu_2 = np.random.normal(loc=0, scale=1, size=(N, n))


We now add some noise, $$\epsilon=0.6$$, to $$\mu_2$$ such that

$\mu_2 = \epsilon \mu_2 + (1-\epsilon)*\mu_1$

epsilon = 0.6
mu_2 = epsilon*mu_2 + (1-epsilon)*mu_1


We use Pandas to calculate the correlation matrix:

C1 = pd.DataFrame(mu_1).corr()
C2 = pd.DataFrame(mu_2).corr()


And we plot the correlation matrices.

### Spearman correlation

Calculate similarity using Spearman correlation between the top triangle of the covariance matrices $$C_1$$ and $$C_2$$.

indices = np.triu_indices(C1.shape[0], k=1)
print(C1[indices])