Feature scaling
Techniques
The most common techniques for feature scaling are normalisation and standardisation. For the examples, we will use the reference dataframe
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'x': np.random.rand(100)*10.0,
'y': np.random.rand(100)*2.0})
print(df)
x y
0 7.338272 0.963962
1 9.282307 0.799143
2 2.505291 0.664340
3 3.212283 0.137100
4 4.370920 0.383998
.. ... ...
95 1.454787 0.773893
96 3.847065 1.478079
97 4.198221 0.308595
98 9.986268 0.298912
99 4.940190 0.916740
[100 rows x 2 columns]
Min-Max scaler
A common scaler which transforms the original space between $[A, B]$ to another space $[A^{\prime}, B^{\prime}]$. Typically, $[A^{\prime}, B^{\prime}]=[0, 1]$. The transformation is:
$$ x^{\prime}=\frac{x-x_{min}}{x_{max}-x_{min}} $$
The Min-Max scaler works best when no normality is assumed and it is very sensitive to outliers.
Example
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=['x','y'])