Python Pandas

Subsetting and indexing

Indexing performance

Let’s assume the case where you a column BOOL with values Y or N that you want to replace with an integer 1 or 0 value. The inital1 instinct would be to do something like:

df["BOOL"] = df["BOOL"].eq("Y").mul(1)

This will result in the warning

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Pandas documentation recommends the usage of the following idiom, since it can be considerably faster:

df.loc[:, ("BOOL")] = df.loc[:, ("BOOL")].eq("Y").mul(1)

  1. and Pythonic? ↩︎