Pandas basics

Column operations

Renaming columns

import pandas as pd
import numpy as np
import warnings

warnings.filterwarnings('ignore')

df = pd.DataFrame({ 
    'a':np.random.randn(6),
    'b':np.random.choice( [5,7,np.nan], 6),
    'c':np.random.choice( ['foo','bar','baz'], 6),
    })
df.head()
a b c
0 0.549838 5.0 baz
1 0.658684 NaN foo
2 -0.784545 NaN foo
3 0.204787 5.0 foo
4 1.206179 5.0 foo
df.rename(columns={"a": "new_name"}, inplace=True)
df.columns
Index(['new_name', 'b', 'c'], dtype='object')

Using a mapping function. In this case str.upper():

df.rename(columns=str.upper, inplace=True)
df.columns
Index(['NEW_NAME', 'B', 'C'], dtype='object')

We can also use a lambda. For instance, using lambda x: x.capitalize() would result:

df.rename(columns=lambda x: x.capitalize(), inplace=True)
df.columns
Index(['New_name', 'B', 'C'], dtype='object')

A list of column names can be passed directly to columns.

df.columns = ["first", "second", "third"]
df.columns
Index(['first', 'second', 'third'], dtype='object')

Dropping columns

A column can be dropped using the .drop() method along with the column keyword. For instance in the dataframe df:

df
first second third
0 0.549838 5.0 baz
1 0.658684 NaN foo
2 -0.784545 NaN foo
3 0.204787 5.0 foo
4 1.206179 5.0 foo
5 -0.898500 5.0 baz

We can drop the second column using:

df.drop(columns='second')
first third
0 0.549838 baz
1 0.658684 foo
2 -0.784545 foo
3 0.204787 foo
4 1.206179 foo
5 -0.898500 baz

The del keyword is also a possibility. However, del changes the dataframe in-place, therefore we will make a copy of the dataframe first.

df_copy = df.copy()
df_copy
first second third
0 0.549838 5.0 baz
1 0.658684 NaN foo
2 -0.784545 NaN foo
3 0.204787 5.0 foo
4 1.206179 5.0 foo
5 -0.898500 5.0 baz
del df_copy['second']
df_copy
first third
0 0.549838 baz
1 0.658684 foo
2 -0.784545 foo
3 0.204787 foo
4 1.206179 foo
5 -0.898500 baz

Yet another possibility is to drop the column by index. For instance:

df.drop(columns=df.columns[1])
first third
0 0.549838 baz
1 0.658684 foo
2 -0.784545 foo
3 0.204787 foo
4 1.206179 foo
5 -0.898500 baz

Or we could use ranges, for instance:

df.drop(columns=df.columns[0:2])
third
0 baz
1 foo
2 foo
3 foo
4 foo
5 baz