Pandas basics
Column operations
Renaming columns
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
df = pd.DataFrame({
'a':np.random.randn(6),
'b':np.random.choice( [5,7,np.nan], 6),
'c':np.random.choice( ['foo','bar','baz'], 6),
})
df.head()
a | b | c | |
---|---|---|---|
0 | 0.549838 | 5.0 | baz |
1 | 0.658684 | NaN | foo |
2 | -0.784545 | NaN | foo |
3 | 0.204787 | 5.0 | foo |
4 | 1.206179 | 5.0 | foo |
df.rename(columns={"a": "new_name"}, inplace=True)
df.columns
Index(['new_name', 'b', 'c'], dtype='object')
Using a mapping function. In this case str.upper()
:
df.rename(columns=str.upper, inplace=True)
df.columns
Index(['NEW_NAME', 'B', 'C'], dtype='object')
We can also use a lambda. For instance, using lambda x: x.capitalize()
would result:
df.rename(columns=lambda x: x.capitalize(), inplace=True)
df.columns
Index(['New_name', 'B', 'C'], dtype='object')
A list of column names can be passed directly to columns.
df.columns = ["first", "second", "third"]
df.columns
Index(['first', 'second', 'third'], dtype='object')
Dropping columns
A column can be dropped using the .drop()
method along with the column
keyword. For instance in the dataframe df
:
df
first | second | third | |
---|---|---|---|
0 | 0.549838 | 5.0 | baz |
1 | 0.658684 | NaN | foo |
2 | -0.784545 | NaN | foo |
3 | 0.204787 | 5.0 | foo |
4 | 1.206179 | 5.0 | foo |
5 | -0.898500 | 5.0 | baz |
We can drop the second
column using:
df.drop(columns='second')
first | third | |
---|---|---|
0 | 0.549838 | baz |
1 | 0.658684 | foo |
2 | -0.784545 | foo |
3 | 0.204787 | foo |
4 | 1.206179 | foo |
5 | -0.898500 | baz |
The del
keyword is also a possibility. However, del
changes the dataframe in-place, therefore we will make a copy of the dataframe first.
df_copy = df.copy()
df_copy
first | second | third | |
---|---|---|---|
0 | 0.549838 | 5.0 | baz |
1 | 0.658684 | NaN | foo |
2 | -0.784545 | NaN | foo |
3 | 0.204787 | 5.0 | foo |
4 | 1.206179 | 5.0 | foo |
5 | -0.898500 | 5.0 | baz |
del df_copy['second']
df_copy
first | third | |
---|---|---|
0 | 0.549838 | baz |
1 | 0.658684 | foo |
2 | -0.784545 | foo |
3 | 0.204787 | foo |
4 | 1.206179 | foo |
5 | -0.898500 | baz |
Yet another possibility is to drop the column by index. For instance:
df.drop(columns=df.columns[1])
first | third | |
---|---|---|
0 | 0.549838 | baz |
1 | 0.658684 | foo |
2 | -0.784545 | foo |
3 | 0.204787 | foo |
4 | 1.206179 | foo |
5 | -0.898500 | baz |
Or we could use ranges, for instance:
df.drop(columns=df.columns[0:2])
third | |
---|---|
0 | baz |
1 | foo |
2 | foo |
3 | foo |
4 | foo |
5 | baz |