Linear separability and Perceptrons

What is a linearly separable dataset and how to test it?

Formally, we can define an $n$-dimensional dataset as linearly separable if a hyperplane can completely separate two subsets. If we assume two subsets $A$ and $B$, such that $A={A^1,\ldots,A^a} \subseteq \mathbb{R}^d$ and $B={B^1,\ldots,B^b} \subset \mathbb{R}^d$ then:

$$ \exists a \in \mathbb{R}^n, b\in\mathbb{R}:A\subset\{x\in\mathbb{R}^n:a^Tx>b\}, B\subset\{x\in\mathbb{R}^n:a^T\leq b\} $$

Let’s illustrate this concept by first generating two datasets in $\mathbb{R}^2$, one clearly non-separable and another clearly separable and by a supervised learning technique with both to separate them.

Generating datasets

To generate a clearly non-separable data, we can generate two concentric datasets in $\mathbb{R}^2$. To do that we will use scikit-learn utility method make_circles1:

from sklearn import cluster, datasets

N = 2000
X = datasets.make_circles(n_samples=N, factor=0.4, noise=.05)

Intuitively, we know that this dataset is non-separable. We cannot see any line that could cleanly divided these two datasets. We now generate a clearly separable dataset and to do so, we use the make_blobs (also2 from scikit-learn):

X2 = datasets.make_blobs(n_samples=N, n_features=2, centers=2, cluster_std=0.5)

Again, intuitively, we know that this dataset is separable since we can clearly see that it is possible to separate these two datasets.


The venerable Perceptron (which has a fascinating3 history, by the way) is a simple binary classifier. For our scope, we will simply define it as a threshold function with the form:

$$ f(\mathbf{x}) = \begin{cases} 1\qquad\text{if}\ \mathbf{w}\cdot\mathbf{x} + b > 0, \\
0\qquad\text{otherwise} \end{cases}. $$

Taking an input $\mathbf{x}$, the Perceptron return a binary classification, $f(\mathbf{x})$ using weights $\mathbf{w}$ and a bias $b$.

We will simply apply scitkit-learn’s Perceptron implementation to both datasets (after normalisation) and try to estimate a separation line.

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron

# normalisation
sc= StandardScaler()
x = sc.fit_transform(A[0])

# fitting
perceptron = Perceptron(random_state = 0), y)

If we plot our decision boundary (below), we can see that our intuition was correct, of course. The perceptron simply could not find a clear separation between the two subsets.

If we repeat the process for the separable dataset, then we will see that this time the Perceptron was able to find such a separation.

We can quantify how well the Perceptron fared using the perceptron.score() method4. This method returns the mean accuracy of the given data and labels. If we have a score of $1.0$, that means that it was able to completely separate the subsets and the label predictions were totally correct. For the non-separable dataset, we get a score of $0.634$, which means that the method fared pretty badly, as expected. On the other hand, the score for the separable dataset we get of score of $1.0$.

  1. ↩︎

  2. ↩︎

  3. ↩︎

  4. ↩︎