Linear separability and Perceptrons
What is a linearly separable dataset and how to test it?
Formally, we can define an $n$dimensional dataset as linearly separable if a hyperplane can completely separate two subsets. If we assume two subsets $A$ and $B$, such that $A={A^1,\ldots,A^a} \subseteq \mathbb{R}^d$ and $B={B^1,\ldots,B^b} \subset \mathbb{R}^d$ then:
$$ \exists a \in \mathbb{R}^n, b\in\mathbb{R}:A\subset\{x\in\mathbb{R}^n:a^Tx>b\}, B\subset\{x\in\mathbb{R}^n:a^T\leq b\} $$
Let’s illustrate this concept by first generating two datasets in $\mathbb{R}^2$, one clearly nonseparable and another clearly separable and by a supervised learning technique with both to separate them.
Generating datasets
To generate a clearly nonseparable data, we can generate two concentric datasets in $\mathbb{R}^2$. To do that we will use scikitlearn
utility method make_circles
^{1}:
from sklearn import cluster, datasets
N = 2000
X = datasets.make_circles(n_samples=N, factor=0.4, noise=.05)
Intuitively, we know that this dataset is nonseparable. We cannot see any line that could cleanly divided these two datasets.
We now generate a clearly separable dataset and to do so, we use the make_blobs
(also^{2} from scikitlearn
):
X2 = datasets.make_blobs(n_samples=N, n_features=2, centers=2, cluster_std=0.5)
Again, intuitively, we know that this dataset is separable since we can clearly see that it is possible to separate these two datasets.
Perceptron
The venerable Perceptron (which has a fascinating^{3} history, by the way) is a simple binary classifier. For our scope, we will simply define it as a threshold function with the form:
$$
f(\mathbf{x}) =
\begin{cases}
1\qquad\text{if}\ \mathbf{w}\cdot\mathbf{x} + b > 0, \\
0\qquad\text{otherwise}
\end{cases}.
$$
Taking an input $\mathbf{x}$, the Perceptron return a binary classification, $f(\mathbf{x})$ using weights $\mathbf{w}$ and a bias $b$.
We will simply apply scitkitlearn
's Perceptron implementation to both datasets (after normalisation) and try to estimate a separation line.
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
# normalisation
sc= StandardScaler()
x = sc.fit_transform(A[0])
# fitting
perceptron = Perceptron(random_state = 0)
perceptron.fit(x, y)
If we plot our decision boundary (below), we can see that our intuition was correct, of course. The perceptron simply could not find a clear separation between the two subsets.
If we repeat the process for the separable dataset, then we will see that this time the Perceptron was able to find such a separation.
We can quantify how well the Perceptron fared using the perceptron.score()
method^{4}. This method returns the mean accuracy of the given data and labels. If we have a score of $1.0$, that means that it was able to completely separate the subsets and the label predictions were totally correct.
For the nonseparable dataset, we get a score of $0.634$, which means that the method fared pretty badly, as expected. On the other hand, the score for the separable dataset we get of score of $1.0$.

https://scikitlearn.org/stable/modules/generated/sklearn.datasets.make_circles.html ↩︎

https://scikitlearn.org/stable/modules/generated/sklearn.datasets.make_blobs.html ↩︎

https://scikitlearn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html#sklearn.linear_model.Perceptron.score ↩︎