Correlation matrix
Similarity
Let’s create two datasets, \(\mu_1\) and \(\mu_2\) such that
\[ \mu_i = \{x_1,\dots,x_n\} \sim \{\mathcal{U}_1(-1,1),\dots,\mathcal{U}_n(-1,1)\} \]
We will use \(N=100\) observations for a vector sized \(n=10\).
import numpy as np
import pandas as pd
import scipy.stats as stats
from scipy.spatial.distance import squareform
= 10
n = 100
N
0)
np.random.seed(= np.random.normal(loc=0, scale=1, size=(N, n))
mu_1 = np.random.normal(loc=0, scale=1, size=(N, n)) mu_2
We now add some noise, \(\epsilon=0.6\), to \(\mu_2\) such that
\[ \mu_2 = \epsilon \mu_2 + (1-\epsilon)*\mu_1 \]
= 0.6
epsilon = epsilon*mu_2 + (1-epsilon)*mu_1 mu_2
We use Pandas to calculate the correlation matrix:
= pd.DataFrame(mu_1).corr()
C1 = pd.DataFrame(mu_2).corr() C2
And we plot the correlation matrices.
import seaborn as sns
import matplotlib.pyplot as plt
= plt.subplots(1,2, figsize=(10,5))
f,axes "white")
sns.set_style(for ix, m in enumerate([C1,C2]):
="RdBu_r", center=0, vmin=-1, vmax=1, ax=axes[ix], square=True, cbar_kws={"shrink": .5}, xticklabels=True)
sns.heatmap(m, cmapset(title=f"$C_{ix+1}$") axes[ix].
Spearman correlation
Calculate similarity using Spearman correlation between the top triangle of the covariance matrices \(C_1\) and \(C_2\).
= np.triu_indices(C1.shape[0], k=1)
indices print(C1[indices])