Correlation matrix

Similarity

Let’s create two datasets, $\mu_1$ and $\mu_2$ such that

$$ \mu_i = {x_1,\dots,x_n} \sim {\mathcal{U}_1(-1,1),\dots,\mathcal{U}_n(-1,1)} $$

We will use $N=100$ observations for a vector sized $n=10$.

import numpy as np
import pandas as pd
import scipy.stats as stats
from scipy.spatial.distance import squareform

n = 10
N = 100

np.random.seed(0)
mu_1 = np.random.normal(loc=0, scale=1, size=(N, n))
mu_2 = np.random.normal(loc=0, scale=1, size=(N, n))

We now add some noise, $\epsilon=0.6$, to $\mu_2$ such that

$$ \mu_2 = \epsilon \mu_2 + (1-\epsilon)*\mu_1 $$

epsilon = 0.6
mu_2 = epsilon*mu_2 + (1-epsilon)*mu_1

We use Pandas to calculate the correlation matrix:

C1 = pd.DataFrame(mu_1).corr()
C2 = pd.DataFrame(mu_2).corr()

And we plot the correlation matrices.

import seaborn as sns
import matplotlib.pyplot as plt

f,axes = plt.subplots(1,2, figsize=(10,5))
sns.set_style("white")
for ix, m in enumerate([C1,C2]):
  sns.heatmap(m, cmap="RdBu_r", center=0, vmin=-1, vmax=1, ax=axes[ix], square=True, cbar_kws={"shrink": .5}, xticklabels=True)
  axes[ix].set(title=f"$C_{ix+1}$")

Spearman correlation

Calculate similarity using Spearman correlation between the top triangle of the covariance matrices $C_1$ and $C_2$.

indices = np.triu_indices(C1.shape[0], k=1)
print(C1[indices])