Feature scaling

Techniques

The most common techniques for feature scaling are normalisation and standardisation. For the examples, we will use the reference dataframe

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({'x': np.random.rand(100)*10.0,
                   'y': np.random.rand(100)*2.0})
print(df)

           x         y
0   7.338272  0.963962
1   9.282307  0.799143
2   2.505291  0.664340
3   3.212283  0.137100
4   4.370920  0.383998
..       ...       ...
95  1.454787  0.773893
96  3.847065  1.478079
97  4.198221  0.308595
98  9.986268  0.298912
99  4.940190  0.916740

[100 rows x 2 columns]

Min-Max scaler

A common scaler which transforms the original space between \([A, B]\) to another space \([A^{\prime}, B^{\prime}]\). Typically, \([A^{\prime}, B^{\prime}]=[0, 1]\). The transformation is:

\[ x^{\prime}=\frac{x-x_{min}}{x_{max}-x_{min}} \]

The Min-Max scaler works best when no normality is assumed and it is very sensitive to outliers.

Example

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=['x','y'])