# Error metrics ¶

## Confusion matrix ¶

The confusion matrix is typically applied to binary outcome classification problems.

It revolves are the concepts of false and positive outcomes for classification as such:

• True Negative (TN) , when both the prediction and the true value are negative

• True Positive (TP) , when both the prediction and the true value are positive

• False Positive (FP) , when the prediction is positive and the true value is negative

• False Negative (FN) , when the prediction is negative and the true value is positive

Let’s show how we can calculate a confusion matrix from scratch.

The first thing to calculate is the accuracy ( $$ACC$$ ), which we calculate as

\begin{align} ACC &= \frac{TP+TN}{P+N} \ &= \frac{TP+TN}{TP+TN+FP+FN}, \end{align}

where $$TP$$ is the number of true positives in dataset, $$TN$$ the number of true negatives, etc…

Let’s illustrate this is with a mock dataset.

import pandas as pd
import numpy as np
import random

endianness = ['big-endian', 'little-endian']

to_endian = lambda x: endianness[x]

df = pd.DataFrame(data= np.random.randint(0,2, size=100), columns=['endianness'])
df['predicted'] = df.endianness.apply(lambda x: x if random.random()<0.9 else abs(x-1))
df.endianness = df.endianness.apply(to_endian)
df.predicted = df.predicted.apply(to_endian)

df

endianness predicted
0 big-endian big-endian
1 big-endian little-endian
2 little-endian little-endian
3 big-endian big-endian
4 little-endian little-endian
... ... ...
95 big-endian big-endian
96 big-endian little-endian
97 little-endian little-endian
98 big-endian big-endian
99 little-endian little-endian

100 rows × 2 columns

Let’s calculate the number of false positive, true negatives, etc… :

positive_class = df[df.predicted == endianness[0]]
negative_class = df[df.predicted == endianness[1]]

TP = positive_class[positive_class.endianness==positive_class.predicted].count()[0]
TN = negative_class[negative_class.endianness==negative_class.predicted].count()[0]
FP = positive_class[positive_class.endianness!=positive_class.predicted].count()[0]
FN = negative_class[negative_class.endianness!=negative_class.predicted].count()[0]

print(f"TP={TP}, TN={TN}, FP={FP}, FN={FN}")

TP=43, TN=43, FP=4, FN=10


The accuracy will then be:

ACC = (TP+TN)/(TP+TN+FP+FN)
print(f"accuracy={ACC}")

accuracy=0.86


The matrix is then presented as

Positive

Negative

Positive

TP

FP

Negative

FN

TN

Let’s calculate the confusion matrix, using the above quantities.

cm = np.array([[TP, FP], [FN, TN]])
print(cm)

[[43  4]
[10 43]]

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from plotutils import *

df_cm = pd.DataFrame(cm, range(2), range(2))
sns.set(font_scale=1.4)
sns.heatmap(df_cm, annot=True, annot_kws={"size": 16})
plt.title("Confusion matrix")
plt.show()